Table of Contents¶
@[toc]
Introduction¶
Recently, I have been learning about the installation and usage of PaddlePaddle across different NVIDIA graphics card driver versions. This led me to also study how to install and uninstall CUDA and CUDNN on Ubuntu. During this learning process, I am documenting the steps for my own reference while also sharing the knowledge here to help others and reinforce my memory. This article uses uninstalling CUDA 11.8 and CUDNN 8.9.6 as examples, as well as installing CUDA 11.8 and CUDNN 8.9.6.
Installing NVIDIA Graphics Card Driver¶
Disable Nouveau Driver¶
sudo vim /etc/modprobe.d/blacklist.conf
Add the following lines at the end of the file:
blacklist nouveau
options nouveau modeset=0
Then execute:
sudo update-initramfs -u
After rebooting, run the following command. If there is no output, the nouveau driver has been successfully disabled:
lsmod | grep nouveau
Downloading the Driver¶
The official download address is: NVIDIA Driver Download Page. Select the appropriate driver version according to your graphics card. For example, if your graphics card is an RTX 2080 Ti:

After downloading, you will get an installation package. The filename may vary by version:
NVIDIA-Linux-x86_64-535.113.01.run
Uninstalling Old Drivers¶
All operations below need to be performed in the command-line interface. Use the following shortcut to enter the command-line interface and log in (note: this will cause a black screen if you’re using a desktop environment; no need if using remote login):
Ctrl-Alt+F1
Execute the following command to disable the X-Window service, otherwise the driver installation will fail:
sudo service lightdm stop
Execute the following three commands to uninstall the existing graphics card driver:
sudo apt-get remove --purge nvidia*
sudo chmod +x NVIDIA-Linux-x86_64-410.93.run
sudo ./NVIDIA-Linux-x86_64-535.113.01.run --uninstall
Installing the New Driver¶
Directly execute the driver file to start installation, following the default prompts:
sudo ./NVIDIA-Linux-x86_64-410.93.run
Start the X-Window service:
sudo service lightdm start
Finally, reboot the system:
reboot
Note: If you encounter repeated login issues after rebooting, it’s likely due to installing an incompatible driver version. You need to download the correct driver version for your graphics card.
Uninstalling CUDA¶
Uninstalling CUDA is straightforward with a single command. Execute the uninstaller script provided by CUDA. Find the correct script based on your CUDA version:
sudo //usr/local/cuda-11.8/bin/cuda-uninstaller
After uninstallation, there may be residual folders. Since we installed CUDA 11.8, delete these folders as well:
sudo rm -rf /usr/local/cuda-11.8/
This completes the uninstallation of CUDA.
Installing CUDA¶
The CUDA and CUDNN versions to be installed are:
- CUDA 11.8
- CUDNN 8.9.6
All subsequent installation steps are performed as the root user.
Downloading and Installing CUDA¶
You can download the appropriate CUDA version for your system from the official website: CUDA Download Page. The page should look like this:

After downloading, grant execution permissions to the file:
chmod +x cuda_11.8.0_520.61.05_linux.run
Run the installer to start the installation:
./cuda_11.8.0_520.61.05_linux.run
When prompted to accept the license agreement, enter accept:
┌──────────────────────────────────────────────────────────────────────────────┐
│ End User License Agreement │
│ -------------------------- │
│ │
│ NVIDIA Software License Agreement and CUDA Supplement to │
│ Software License Agreement. Last updated: October 8, 2021 │
│ │
│ The CUDA Toolkit End User License Agreement applies to the │
│ NVIDIA CUDA Toolkit, the NVIDIA CUDA Samples, the NVIDIA │
│ Display Driver, NVIDIA Nsight tools (Visual Studio Edition), │
│ and the associated documentation on CUDA APIs, programming │
│ model and development tools. If you do not agree with the │
│ terms and conditions of the license agreement, then do not │
│ download or use the software. │
│ │
│ Last updated: October 8, 2021. │
│ │
│ Do you accept the above EULA? (accept/decline/quit): │
│ accept │
└──────────────────────────────────────────────────────────────────────────────┘
After accepting the agreement, configure the installation options. Use the arrow keys to navigate and the Enter key to select/deselect. Important: Uncheck the driver installation option since we already installed the graphics driver. Then navigate to Install and press Enter to start installation:
┌──────────────────────────────────────────────────────────────────────────────┐
│ CUDA Installer │
│ - [ ] Driver │
│ [ ] 520.61.05 │
│ + [X] CUDA Toolkit 11.8 │
│ [X] CUDA Demo Suite 11.8 │
│ [X] CUDA Documentation 11.8 │
│ - [ ] Kernel Objects │
│ [ ] nvidia-fs │
│ Options │
│ Install │
│ │
│ Up/Down: Move | Left/Right: Expand | 'Enter': Select | 'A': Advanced options │
└──────────────────────────────────────────────────────────────────────────────┘
After installation, configure environment variables. Add the following to the end of vim ~/.bashrc:
export LD_LIBRARY_PATH=/usr/local/cuda-11.8/lib64:/usr/local/cuda/extras/CPUTI/lib64
export CUDA_HOME=/usr/local/cuda-11.8
export PATH=$PATH:$LD_LIBRARY_PATH:$CUDA_HOME/bin
Make the changes take effect with:
source ~/.bashrc
Verify the installation with:
nvcc -V
Expected output:
test@test:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
Testing Installation Success¶
Execute the following commands to verify:
/usr/local/cuda-11.8/extras/demo_suite/
./deviceQuery
Expected output:
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 2 CUDA Capable device(s)
Device 0: "NVIDIA GeForce RTX 2080 Ti"
CUDA Driver Version / Runtime Version 12.2 / 11.8
CUDA Capability Major/Minor version number: 7.5
Total amount of global memory: 22189 MBytes (23267246080 bytes)
(68) Multiprocessors, ( 64) CUDA Cores/MP: 4352 CUDA Cores
GPU Max Clock rate: 1545 MHz (1.54 GHz)
Memory Clock rate: 7000 Mhz
Memory Bus Width: 352-bit
L2 Cache Size: 5767168 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 1024
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 3 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
················································
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.2, CUDA Runtime Version = 11.8, NumDevs = 2, Device0 = NVIDIA GeForce RTX 2080 Ti, Device1 = NVIDIA GeForce RTX 2080 Ti
Result = PASS
Downloading and Installing CUDNN¶
Visit the CUDNN download page: NVIDIA cuDNN Download. After logging in, select cuDNN Library for Linux. The page will look like this:

Download the compressed package, e.g.:
cudnn-linux-x86_64-8.9.6.50_cuda11-archive.tar.xz
Extract the package:
tar -xf cudnn-linux-x86_64-8.9.6.50_cuda11-archive.tar.xz
You’ll get the following files:
cudnn-linux-x86_64-8.9.6.50_cuda11-archive/include/
cudnn-linux-x86_64-8.9.6.50_cuda11-archive/lib/
cudnn-linux-x86_64-8.9.6.50_cuda11-archive/LICENSE
Copy these files to the CUDA directory:
cp cudnn-linux-x86_64-8.9.6.50_cuda11-archive/lib/* /usr/local/cuda-11.8/lib64/
cp cudnn-linux-x86_64-8.9.6.50_cuda11-archive/include/* /usr/local/cuda-11.8/include/
Verify the CUDNN version with:
cat /usr/local/cuda-11.8/include/cudnn_version.h | grep CUDNN_MAJOR -A 2
Testing Installation Results¶
At this point, CUDA 11.8 and CUDNN 8.9.6 are installed. Test with PyTorch’s GPU version to confirm functionality:
Install PyTorch:
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118
Test with the following Python code:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torch.backends.cudnn as cudnn
from torchvision import datasets, transforms
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
self.conv2_drop = nn.Dropout2d()
self.fc1 = nn.Linear(320, 50)
self.fc2 = nn.Linear(50, 10)
def forward(self, x):
x = F.relu(F.max_pool2d(self.conv1(x), 2))
x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
x = x.view(-1, 320)
x = F.relu(self.fc1(x))
x = F.dropout(x, training=self.training)
x = self.fc2(x)
return F.log_softmax(x, dim=1)
def train(model, device, train_loader, optimizer, epoch):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = F.nll_loss(output, target)
loss.backward()
optimizer.step()
if batch_idx % 10 == 0:
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
epoch, batch_idx * len(data), len(train_loader.dataset),
100. * batch_idx / len(train_loader), loss.item()))
def main():
cudnn.benchmark = True
torch.manual_seed(1)
device = torch.device("cuda")
kwargs = {'num_workers': 1, 'pin_memory': True}
train_loader = torch.utils.data.DataLoader(
datasets.MNIST('../data', train=True, download=True,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])),
batch_size=64, shuffle=True, **kwargs)
model = Net().to(device)
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.5)
for epoch in range(1, 11):
train(model, device, train_loader, optimizer, epoch)
if __name__ == '__main__':
main()
Expected output should show training loss decreasing:
```text
Train Epoch: 1 [0/60000 (0%)] Loss: 2.365850
Train Epoch: 1 [640/60000 (1%)] Loss: 2.305295
Train Epoch: 1 [1280/60000 (2%)] Loss: 2.301407
Train Epoch: 1 [1920/60000 (3%)] Loss: 2.316538
Train Epoch: 1 [2560/60000 (4%)] Loss: 2.255809
Train Epoch: 1 [3200/60000 (