目錄¶
@[toc]
前言¶
最近在學習PaddlePaddle在各個顯卡驅動版本的安裝和使用,所以同時也學習如何在Ubuntu安裝和卸載CUDA和CUDNN,在學習過程中,順便記錄學習過程。在供大家學習的同時,也在加強自己的記憶。本文章以卸載CUDA 11.8 和 CUDNN 8.9.6 爲例,以安裝CUDA 11.8 和 CUDNN 8.9.6 爲例。
安裝顯卡驅動¶
禁用nouveau驅動¶
sudo vim /etc/modprobe.d/blacklist.conf
在文本最後添加:
blacklist nouveau
options nouveau modeset=0
然後執行:
sudo update-initramfs -u
重啓後,執行以下命令,如果沒有屏幕輸出,說明禁用nouveau成功:
lsmod | grep nouveau
下載驅動¶
官網下載地址:https://www.nvidia.cn/Download/index.aspx?lang=cn ,根據自己顯卡的情況下載對應版本的顯卡驅動,比如筆者的顯卡是RTX2080ti:

下載完成之後會得到一個安裝包,不同版本文件名可能不一樣:
NVIDIA-Linux-x86_64-535.113.01.run
卸載舊驅動¶
以下操作都需要在命令界面操作,執行以下快捷鍵進入命令界面,並登錄(注意:如果是桌面,操作這個會黑屏,如果是遠程登錄,不需要執行這條命令):
Ctrl-Alt+F1
執行以下命令禁用X-Window服務,否則無法安裝顯卡驅動:
sudo service lightdm stop
執行以下三條命令卸載原有顯卡驅動:
sudo apt-get remove --purge nvidia*
sudo chmod +x NVIDIA-Linux-x86_64-410.93.run
sudo ./NVIDIA-Linux-x86_64-535.113.01.run --uninstall
安裝新驅動¶
直接執行驅動文件即可安裝新驅動,一直默認即可:
sudo ./NVIDIA-Linux-x86_64-410.93.run
執行以下命令啓動X-Window服務
sudo service lightdm start
最後執行重啓命令,重啓系統即可:
reboot
注意: 如果系統重啓之後出現重複登錄的情況,多數情況下都是安裝了錯誤版本的顯卡驅動。需要下載對應本身機器安裝的顯卡版本。
卸載CUDA¶
卸載CUDA很簡單,一條命令就可以了,主要執行的是CUDA自帶的卸載腳本,讀者要根據自己的cuda版本找到卸載腳本:
sudo //usr/local/cuda-11.8/bin/cuda-uninstaller
卸載之後,還有一些殘留的文件夾,之前安裝的是CUDA 11.8。可以一併刪除:
sudo rm -rf /usr/local/cuda-11.8/
這樣就算卸載完了CUDA。
安裝CUDA¶
安裝的CUDA和CUDNN版本:
- CUDA 11.8
- CUDNN 8.9.6
接下來的安裝步驟都是在root用戶下操作的。
下載和安裝CUDA¶
我們可以在官網:CUDA下載頁面,
下載符合自己系統版本的CUDA。頁面如下:

下載完成之後,給文件賦予執行權限:
chmod +x cuda_11.8.0_520.61.05_linux.run
執行安裝包,開始安裝:
./cuda_11.8.0_520.61.05_linux.run
開始安裝之後,需要閱讀說明,可以直接輸入accept同意:
┌──────────────────────────────────────────────────────────────────────────────┐
│ End User License Agreement │
│ -------------------------- │
│ │
│ NVIDIA Software License Agreement and CUDA Supplement to │
│ Software License Agreement. Last updated: October 8, 2021 │
│ │
│ The CUDA Toolkit End User License Agreement applies to the │
│ NVIDIA CUDA Toolkit, the NVIDIA CUDA Samples, the NVIDIA │
│ Display Driver, NVIDIA Nsight tools (Visual Studio Edition), │
│ and the associated documentation on CUDA APIs, programming │
│ model and development tools. If you do not agree with the │
│ terms and conditions of the license agreement, then do not │
│ download or use the software. │
│ │
│ Last updated: October 8, 2021. │
│ │
│ │
│ Preface │
│ ------- │
│ │
│──────────────────────────────────────────────────────────────────────────────│
│ Do you accept the above EULA? (accept/decline/quit): │
│ accept │
└──────────────────────────────────────────────────────────────────────────────┘
同意說明之後,可以開始安裝,可以通過上下鍵移動,回車鍵選擇和取消。這裏要注意取消勾選安裝驅動,因爲我們已經安裝過驅動了。然後移動到Install回車開始安裝即可。
┌──────────────────────────────────────────────────────────────────────────────┐
│ CUDA Installer │
│ - [ ] Driver │
│ [ ] 520.61.05 │
│ + [X] CUDA Toolkit 11.8 │
│ [X] CUDA Demo Suite 11.8 │
│ [X] CUDA Documentation 11.8 │
│ - [ ] Kernel Objects │
│ [ ] nvidia-fs │
│ Options │
│ Install │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
│ Up/Down: Move | Left/Right: Expand | 'Enter': Select | 'A': Advanced options │
└──────────────────────────────────────────────────────────────────────────────┘
安裝完成之後,可以配置他們的環境變量,在vim ~/.bashrc的最後加上以下配置信息:
export LD_LIBRARY_PATH=/usr/local/cuda-11.8/lib64:/usr/local/cuda/extras/CPUTI/lib64
export CUDA_HOME=/usr/local/cuda-11.8
export PATH=$PATH:$LD_LIBRARY_PATH:$CUDA_HOME/bin
最後使用命令source ~/.bashrc使它生效。
可以使用命令nvcc -V查看安裝的版本信息:
test@test:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
測試安裝是否成功¶
執行以下幾條命令:
/usr/local/cuda-11.8/extras/demo_suite/
./deviceQuery
正常情況下輸出:
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 2 CUDA Capable device(s)
Device 0: "NVIDIA GeForce RTX 2080 Ti"
CUDA Driver Version / Runtime Version 12.2 / 11.8
CUDA Capability Major/Minor version number: 7.5
Total amount of global memory: 22189 MBytes (23267246080 bytes)
(68) Multiprocessors, ( 64) CUDA Cores/MP: 4352 CUDA Cores
GPU Max Clock rate: 1545 MHz (1.54 GHz)
Memory Clock rate: 7000 Mhz
Memory Bus Width: 352-bit
L2 Cache Size: 5767168 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 1024
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 3 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
················································
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.2, CUDA Runtime Version = 11.8, NumDevs = 2, Device0 = NVIDIA GeForce RTX 2080 Ti, Device1 = NVIDIA GeForce RTX 2080 Ti
Result = PASS
下載和安裝CUDNN¶
進入到CUDNN的下載官網:https://developer.nvidia.com/rdp/cudnn-download ,然點擊Download開始選擇下載版本,當然在下載之前還有登錄,選擇版本界面如下,我們選擇cuDNN Library for Linux:

下載之後是一個壓縮包,如下:
cudnn-linux-x86_64-8.9.6.50_cuda11-archive.tar.xz
然後對它進行解壓,命令如下:
tar -xf cudnn-linux-x86_64-8.9.6.50_cuda11-archive.tar.xz
解壓之後可以得到兩個文件夾:
cudnn-linux-x86_64-8.9.6.50_cuda11-archive/include/
cudnn-linux-x86_64-8.9.6.50_cuda11-archive/lib/
cudnn-linux-x86_64-8.9.6.50_cuda11-archive/LICENSE
使用以下兩條命令複製這些文件到CUDA目錄下:
cp cudnn-linux-x86_64-8.9.6.50_cuda11-archive/lib/* /usr/local/cuda-11.8/lib64/
cp cudnn-linux-x86_64-8.9.6.50_cuda11-archive/include/* /usr/local/cuda-11.8/include/
拷貝完成之後,可以使用以下命令查看CUDNN的版本信息:
cat /usr/local/cuda-11.8/include/cudnn_version.h | grep CUDNN_MAJOR -A 2
測試安裝結果¶
到這裏就已經完成了CUDA 11.8 和 CUDNN 8.9.6 的安裝。可以安裝對應的Pytorch的GPU版本測試是否可以正常使用了。安裝如下:
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118
然後使用以下的程序測試安裝情況:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torch.backends.cudnn as cudnn
from torchvision import datasets, transforms
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
self.conv2_drop = nn.Dropout2d()
self.fc1 = nn.Linear(320, 50)
self.fc2 = nn.Linear(50, 10)
def forward(self, x):
x = F.relu(F.max_pool2d(self.conv1(x), 2))
x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
x = x.view(-1, 320)
x = F.relu(self.fc1(x))
x = F.dropout(x, training=self.training)
x = self.fc2(x)
return F.log_softmax(x, dim=1)
def train(model, device, train_loader, optimizer, epoch):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = F.nll_loss(output, target)
loss.backward()
optimizer.step()
if batch_idx % 10 == 0:
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
epoch, batch_idx * len(data), len(train_loader.dataset),
100. * batch_idx / len(train_loader), loss.item()))
def main():
cudnn.benchmark = True
torch.manual_seed(1)
device = torch.device("cuda")
kwargs = {'num_workers': 1, 'pin_memory': True}
train_loader = torch.utils.data.DataLoader(
datasets.MNIST('../data', train=True, download=True,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])),
batch_size=64, shuffle=True, **kwargs)
model = Net().to(device)
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.5)
for epoch in range(1, 11):
train(model, device, train_loader, optimizer, epoch)
if __name__ == '__main__':
main()
如果正常輸出一下以下信息,證明已經安裝成了:
Train Epoch: 1 [0/60000 (0%)] Loss: 2.365850
Train Epoch: 1 [640/60000 (1%)] Loss: 2.305295
Train Epoch: 1 [1280/60000 (2%)] Loss: 2.301407
Train Epoch: 1 [1920/60000 (3%)] Loss: 2.316538
Train Epoch: 1 [2560/60000 (4%)] Loss: 2.255809
Train Epoch: 1 [3200/60000 (5%)] Loss: 2.224511
Train Epoch: 1 [3840/60000 (6%)] Loss: 2.216569
Train Epoch: 1 [4480/60000 (7%)] Loss: 2.181396
參考資料¶
- https://developer.nvidia.com
- https://www.cnblogs.com/luofeel/p/8654964.html
]