Table of Contents

@[toc]

Introduction

Recently, I have been learning about the installation and usage of PaddlePaddle across different NVIDIA graphics card driver versions. This led me to also study how to install and uninstall CUDA and CUDNN on Ubuntu. During this learning process, I am documenting the steps for my own reference while also sharing the knowledge here to help others and reinforce my memory. This article uses uninstalling CUDA 11.8 and CUDNN 8.9.6 as examples, as well as installing CUDA 11.8 and CUDNN 8.9.6.

Installing NVIDIA Graphics Card Driver

Disable Nouveau Driver

sudo vim /etc/modprobe.d/blacklist.conf

Add the following lines at the end of the file:

blacklist nouveau
options nouveau modeset=0

Then execute:

sudo update-initramfs -u

After rebooting, run the following command. If there is no output, the nouveau driver has been successfully disabled:

lsmod | grep nouveau

Downloading the Driver

The official download address is: NVIDIA Driver Download Page. Select the appropriate driver version according to your graphics card. For example, if your graphics card is an RTX 2080 Ti:
Graphics Card Driver Download Selection

After downloading, you will get an installation package. The filename may vary by version:

NVIDIA-Linux-x86_64-535.113.01.run

Uninstalling Old Drivers

All operations below need to be performed in the command-line interface. Use the following shortcut to enter the command-line interface and log in (note: this will cause a black screen if you’re using a desktop environment; no need if using remote login):

Ctrl-Alt+F1

Execute the following command to disable the X-Window service, otherwise the driver installation will fail:

sudo service lightdm stop

Execute the following three commands to uninstall the existing graphics card driver:

sudo apt-get remove --purge nvidia*
sudo chmod +x NVIDIA-Linux-x86_64-410.93.run
sudo ./NVIDIA-Linux-x86_64-535.113.01.run --uninstall

Installing the New Driver

Directly execute the driver file to start installation, following the default prompts:

sudo ./NVIDIA-Linux-x86_64-410.93.run

Start the X-Window service:

sudo service lightdm start

Finally, reboot the system:

reboot

Note: If you encounter repeated login issues after rebooting, it’s likely due to installing an incompatible driver version. You need to download the correct driver version for your graphics card.

Uninstalling CUDA

Uninstalling CUDA is straightforward with a single command. Execute the uninstaller script provided by CUDA. Find the correct script based on your CUDA version:

sudo //usr/local/cuda-11.8/bin/cuda-uninstaller

After uninstallation, there may be residual folders. Since we installed CUDA 11.8, delete these folders as well:

sudo rm -rf /usr/local/cuda-11.8/

This completes the uninstallation of CUDA.

Installing CUDA

The CUDA and CUDNN versions to be installed are:
- CUDA 11.8
- CUDNN 8.9.6

All subsequent installation steps are performed as the root user.

Downloading and Installing CUDA

You can download the appropriate CUDA version for your system from the official website: CUDA Download Page. The page should look like this:
CUDA Download Page

After downloading, grant execution permissions to the file:

chmod +x cuda_11.8.0_520.61.05_linux.run

Run the installer to start the installation:

./cuda_11.8.0_520.61.05_linux.run

When prompted to accept the license agreement, enter accept:

┌──────────────────────────────────────────────────────────────────────────────┐
│  End User License Agreement                                                  │
│  --------------------------                                                  │
│                                                                              │
│  NVIDIA Software License Agreement and CUDA Supplement to                    │
│  Software License Agreement. Last updated: October 8, 2021                   │
│                                                                              │
│  The CUDA Toolkit End User License Agreement applies to the                  │
│  NVIDIA CUDA Toolkit, the NVIDIA CUDA Samples, the NVIDIA                    │
│  Display Driver, NVIDIA Nsight tools (Visual Studio Edition),                │
│  and the associated documentation on CUDA APIs, programming                  │
│  model and development tools. If you do not agree with the                   │
│  terms and conditions of the license agreement, then do not                  │
│  download or use the software.                                               │
│                                                                              │
│  Last updated: October 8, 2021.                                              │
│                                                                              │
│  Do you accept the above EULA? (accept/decline/quit):                         │
│  accept                                                                       │
└──────────────────────────────────────────────────────────────────────────────┘

After accepting the agreement, configure the installation options. Use the arrow keys to navigate and the Enter key to select/deselect. Important: Uncheck the driver installation option since we already installed the graphics driver. Then navigate to Install and press Enter to start installation:

┌──────────────────────────────────────────────────────────────────────────────┐
│ CUDA Installer                                                               │
│ - [ ] Driver                                                                 │
│      [ ] 520.61.05                                                           │
│ + [X] CUDA Toolkit 11.8                                                      │
│   [X] CUDA Demo Suite 11.8                                                   │
│   [X] CUDA Documentation 11.8                                                │
│ - [ ] Kernel Objects                                                         │
│      [ ] nvidia-fs                                                           │
│   Options                                                                    │
│   Install                                                                    │
│                                                                              │
│ Up/Down: Move | Left/Right: Expand | 'Enter': Select | 'A': Advanced options │
└──────────────────────────────────────────────────────────────────────────────┘

After installation, configure environment variables. Add the following to the end of vim ~/.bashrc:

export LD_LIBRARY_PATH=/usr/local/cuda-11.8/lib64:/usr/local/cuda/extras/CPUTI/lib64
export CUDA_HOME=/usr/local/cuda-11.8
export PATH=$PATH:$LD_LIBRARY_PATH:$CUDA_HOME/bin

Make the changes take effect with:

source ~/.bashrc

Verify the installation with:

nvcc -V

Expected output:

test@test:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

Testing Installation Success

Execute the following commands to verify:

/usr/local/cuda-11.8/extras/demo_suite/
./deviceQuery

Expected output:

./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 2 CUDA Capable device(s)

Device 0: "NVIDIA GeForce RTX 2080 Ti"
  CUDA Driver Version / Runtime Version          12.2 / 11.8
  CUDA Capability Major/Minor version number:    7.5
  Total amount of global memory:                 22189 MBytes (23267246080 bytes)
  (68) Multiprocessors, ( 64) CUDA Cores/MP:     4352 CUDA Cores
  GPU Max Clock rate:                            1545 MHz (1.54 GHz)
  Memory Clock rate:                             7000 Mhz
  Memory Bus Width:                              352-bit
  L2 Cache Size:                                 5767168 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1024
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 3 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
················································
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.2, CUDA Runtime Version = 11.8, NumDevs = 2, Device0 = NVIDIA GeForce RTX 2080 Ti, Device1 = NVIDIA GeForce RTX 2080 Ti
Result = PASS

Downloading and Installing CUDNN

Visit the CUDNN download page: NVIDIA cuDNN Download. After logging in, select cuDNN Library for Linux. The page will look like this:
CUDNN Download Selection

Download the compressed package, e.g.:

cudnn-linux-x86_64-8.9.6.50_cuda11-archive.tar.xz

Extract the package:

tar -xf cudnn-linux-x86_64-8.9.6.50_cuda11-archive.tar.xz

You’ll get the following files:

cudnn-linux-x86_64-8.9.6.50_cuda11-archive/include/
cudnn-linux-x86_64-8.9.6.50_cuda11-archive/lib/
cudnn-linux-x86_64-8.9.6.50_cuda11-archive/LICENSE

Copy these files to the CUDA directory:

cp cudnn-linux-x86_64-8.9.6.50_cuda11-archive/lib/* /usr/local/cuda-11.8/lib64/
cp cudnn-linux-x86_64-8.9.6.50_cuda11-archive/include/* /usr/local/cuda-11.8/include/

Verify the CUDNN version with:

cat /usr/local/cuda-11.8/include/cudnn_version.h | grep CUDNN_MAJOR -A 2

Testing Installation Results

At this point, CUDA 11.8 and CUDNN 8.9.6 are installed. Test with PyTorch’s GPU version to confirm functionality:

Install PyTorch:

pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118

Test with the following Python code:

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torch.backends.cudnn as cudnn
from torchvision import datasets, transforms


class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.conv2_drop = nn.Dropout2d()
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50, 10)

    def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
        x = x.view(-1, 320)
        x = F.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = self.fc2(x)
        return F.log_softmax(x, dim=1)


def train(model, device, train_loader, optimizer, epoch):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()
        if batch_idx % 10 == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                       100. * batch_idx / len(train_loader), loss.item()))

def main():
    cudnn.benchmark = True
    torch.manual_seed(1)
    device = torch.device("cuda")
    kwargs = {'num_workers': 1, 'pin_memory': True}
    train_loader = torch.utils.data.DataLoader(
        datasets.MNIST('../data', train=True, download=True,
                       transform=transforms.Compose([
                           transforms.ToTensor(),
                           transforms.Normalize((0.1307,), (0.3081,))
                       ])),
        batch_size=64, shuffle=True, **kwargs)

    model = Net().to(device)
    optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.5)

    for epoch in range(1, 11):
        train(model, device, train_loader, optimizer, epoch)


if __name__ == '__main__':
    main()

Expected output should show training loss decreasing:
```text
Train Epoch: 1 [0/60000 (0%)] Loss: 2.365850
Train Epoch: 1 [640/60000 (1%)] Loss: 2.305295
Train Epoch: 1 [1280/60000 (2%)] Loss: 2.301407
Train Epoch: 1 [1920/60000 (3%)] Loss: 2.316538
Train Epoch: 1 [2560/60000 (4%)] Loss: 2.255809
Train Epoch: 1 [3200/60000 (

Xiaoye