N卡驱动及CUDA安装.md 4.6 KB

OnnxRuntime 和 CUDA 适配版本

onnxruntime-cuda-version-mapping

前置安装

Ubuntu and Debian 系统所需要的库:

sudo apt-update
sudo apt install net-tools
sudo apt install vim wget curl git
sudo apt install gcc make build-essential 
sudo apt install -y libssl-dev zlib1g-dev
sudo apt install -y libbz2-dev libreadline-dev libsqlite3-dev llvm
sudo apt install -y libncurses5-dev libncursesw5-dev xz-utils tk-dev libffi-dev liblzma-dev
sudo apt install -y libbz2-dev libssl-dev libncurses5-dev libsqlite3-dev libreadline-dev 
sudo apt install -y tk-dev libgdbm-dev libdb-dev libpcap-dev xz-utils libexpat1-dev liblzma-dev libffi-dev libc6-dev

Nvidia 驱动安装

查看显卡型号

lspci | grep -i nvidia
03:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1060 3GB] (rev a1)
03:00.1 Audio device: NVIDIA Corporation GP104 High Definition Audio Controller (rev a1)

然后到 Nvidia 官网下载驱动:https://www.nvidia.cn/drivers/lookup/

nvidia-driver-search

开始安装

sudo ./NVIDIA-1060-Linux-x86_64-550.100.run

安装完成后,重启电脑。输入:nvidia-smi,查看驱动是否安装成功。

nvidia-driver

CUDA 安装

分别到Nvidia 官网下载驱动和 CUDA 工具包。

CUDA https://developer.nvidia.com/cuda-toolkit-archive

cuDNN https://developer.nvidia.com/rdp/cudnn-archive

下载后,上传到制定目录。执行安装:

sudo sh cuda_11.4.0_470.57.02_linux.run

安装期间 接受授权协议 accept 一路回车,或选择 OK。

在选择安装项目列表时, n(不要安装driver,已有驱动, 空格可以取消选中) y y y

安装结束后,输出以下信息则成功。

===========
= Summary =
===========

Driver:   Not Selected
Toolkit:  Installed in /usr/local/cuda-11.4/
Samples:  Installed in /home/yiidata/, but missing recommended libraries

Please make sure that
 -   PATH includes /usr/local/cuda-11.4/bin
 -   LD_LIBRARY_PATH includes /usr/local/cuda-11.4/lib64, or, add /usr/local/cuda-11.4/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-11.4/bin
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 470.00 is required for CUDA 11.4 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
    sudo <CudaInstaller>.run --silent --driver

Logfile is /var/log/cuda-installer.log

安装完成后,设置环境变量。

打开主目录下的 .bashrc文件添加如下路径,添加环境变量。

export CUDA_HOME="/usr/local/cuda-11.4"
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$CUDA_HOME/lib64"
export PATH="$PATH:$CUDA_HOME/bin"

cuDNN 安装

这里选cuDNN Library for Linux(Deb安装容易出错)

cudnn-linux-x86_64-8.6.0.163_cuda11-archive.tar.xz

下载下来,解压下载的文件,可以看到cuda文件夹,在当前目录打开终端,执行如下命令:(也就是把下载的cudnn文件复制到相应的cuda文件中去)

sudo cp -v include/* /usr/local/cuda/include/
sudo cp -v include/cudnn_version.h /usr/local/cuda/include/
sudo cp -v lib/libcudnn* /usr/local/cuda/lib64/
sudo chmod a+r /usr/local/cuda/include/cudnn.h
sudo chmod a+r /usr/local/cuda/lib64/libcudnn*

测试cuda

终端进入虚拟环境中,用nvcc --version检查是否已经安装了cuda

进入python中配置好虚拟环境后测试

import torch
from torch.backends import cudnn

print(torch.cuda.is_available())
torch.zeros(1).cuda() #上面一行有可能是True但是cuda版本不匹配等原因实际上并没有安装成功,要看这一行报不报错
print(cudnn.is_available())

可采用命令验证:

python -m torch.utils.collect_env

卸载 CUDA

卸载CUDA很简单,一条命令就可以了,主要执行的是CUDA自带的卸载脚本,读者要根据自己的cuda版本找到卸载脚本:

sudo /usr/local/cuda-11.4/bin/cuda-uninstaller

Installation Guide Linux :: CUDA Toolkit Documentation (nvidia.com)

sudo apt-get --purge remove "*cublas*" "*cufft*" "*curand*" 
sudo apt-get --purge remove "*cusolver*" "*cusparse*" "*npp*" "*nvjpeg*" "cuda*" "nsight*" 

卸载之后,还有一些残留的文件夹,之前安装的是CUDA 11.4。可以一并删除:

sudo rm -rf /usr/local/cuda-11.4