## OnnxRuntime 和 CUDA 适配版本 ![onnxruntime-cuda-version-mapping](images/onnxruntime-cuda-version-mapping.png) ## Nvidia 驱动安装 查看显卡型号 ```shell lspci | grep -i nvidia 03:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1060 3GB] (rev a1) 03:00.1 Audio device: NVIDIA Corporation GP104 High Definition Audio Controller (rev a1) ``` 然后到 Nvidia 官网下载驱动:https://www.nvidia.cn/drivers/lookup/ ![nvidia-driver-search](images/nvidia-driver-search.png) 开始安装 ```shell sudo ./NVIDIA-1060-Linux-x86_64-550.100.run ``` 安装完成后,重启电脑。输入:nvidia-smi,查看驱动是否安装成功。 ![nvidia-driver](images/nvidia-driver.png) ## CUDA 安装 ```log =========== = Summary = =========== Driver: Not Selected Toolkit: Installed in /usr/local/cuda-11.4/ Samples: Installed in /home/yiidata/, but missing recommended libraries Please make sure that - PATH includes /usr/local/cuda-11.4/bin - LD_LIBRARY_PATH includes /usr/local/cuda-11.4/lib64, or, add /usr/local/cuda-11.4/lib64 to /etc/ld.so.conf and run ldconfig as root To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-11.4/bin ***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 470.00 is required for CUDA 11.4 functionality to work. To install the driver using this installer, run the following command, replacing with the name of this run file: sudo .run --silent --driver Logfile is /var/log/cuda-installer.log ``` 安装完成后,设置环境变量。 打开主目录下的 .bashrc文件添加如下路径,例如我的.bashrc文件在/home/wangyuanwei下,如果没有找到,则按Ctrl+H键显示隐藏文件。 ```shell export CUDA_HOME="/usr/local/cuda-11.4" export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$CUDA_HOME/lib64" export PATH="$PATH:$CUDA_HOME/bin" ``` ## cuDNN 安装 这里选cuDNN Library for Linux(Deb安装容易出错) 下载下来,解压下载的文件,可以看到cuda文件夹,在当前目录打开终端,执行如下命令:(也就是把下载的cudnn文件复制到相应的cuda文件中去) ```shell sudo cp -v include/* /usr/local/cuda/include/ sudo cp -v include/cudnn_version.h /usr/local/cuda/include/ sudo cp -v lib/libcudnn* /usr/local/cuda/lib64/ sudo chmod a+r /usr/local/cuda/include/cudnn.h sudo chmod a+r /usr/local/cuda/lib64/libcudnn* ``` **测试cuda** 终端进入虚拟环境中,用`nvcc --version`检查是否已经安装了cuda 进入python中配置好虚拟环境后测试 ```python import torch from torch.backends import cudnn print(torch.cuda.is_available()) torch.zeros(1).cuda() #上面一行有可能是True但是cuda版本不匹配等原因实际上并没有安装成功,要看这一行报不报错 print(cudnn.is_available()) ``` ## 卸载 CUDA 卸载CUDA很简单,一条命令就可以了,主要执行的是CUDA自带的卸载脚本,读者要根据自己的cuda版本找到卸载脚本: ```shell sudo /usr/local/cuda-10.0/bin/uninstall_cuda_10.0.pl ``` 或 Installation Guide Linux :: CUDA Toolkit Documentation (nvidia.com) ```shell sudo apt-get --purge remove "*cublas*" "*cufft*" "*curand*" sudo apt-get --purge remove "*cusolver*" "*cusparse*" "*npp*" "*nvjpeg*" "cuda*" "nsight*" ``` 卸载之后,还有一些残留的文件夹,之前安装的是CUDA 10.0。可以一并删除: ```shell sudo rm -rf /usr/local/cuda-10.0/ ```