## OnnxRuntime 和 CUDA 适配版本 ![onnxruntime-cuda-version-mapping](images/onnxruntime-cuda-version-mapping.png) ### 前置安装 Ubuntu and Debian 系统所需要的库: ```shell sudo apt-update sudo apt install net-tools sudo apt install vim wget curl git sudo apt install gcc make build-essential sudo apt install -y libssl-dev zlib1g-dev sudo apt install -y libbz2-dev libreadline-dev libsqlite3-dev llvm sudo apt install -y libncurses5-dev libncursesw5-dev xz-utils tk-dev libffi-dev liblzma-dev sudo apt install -y libbz2-dev libssl-dev libncurses5-dev libsqlite3-dev libreadline-dev sudo apt install -y tk-dev libgdbm-dev libdb-dev libpcap-dev xz-utils libexpat1-dev liblzma-dev libffi-dev libc6-dev ``` ## Nvidia 驱动安装 查看显卡型号 ```shell lspci | grep -i nvidia 03:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1060 3GB] (rev a1) 03:00.1 Audio device: NVIDIA Corporation GP104 High Definition Audio Controller (rev a1) ``` 然后到 Nvidia 官网下载驱动:https://www.nvidia.cn/drivers/lookup/ ![nvidia-driver-search](images/nvidia-driver-search.png) 开始安装 ```shell sudo ./NVIDIA-1060-Linux-x86_64-550.100.run ``` 安装完成后,重启电脑。输入:nvidia-smi,查看驱动是否安装成功。 ![nvidia-driver](images/nvidia-driver.png) ## CUDA 安装 分别到Nvidia 官网下载驱动和 CUDA 工具包。 **CUDA** https://developer.nvidia.com/cuda-toolkit-archive **cuDNN** https://developer.nvidia.com/rdp/cudnn-archive 下载后,上传到制定目录。执行安装: ```shell sudo sh cuda_11.4.0_470.57.02_linux.run ``` 安装期间 接受授权协议 accept 一路回车,或选择 OK。 在选择安装项目列表时, n(不要安装driver,已有驱动, 空格可以取消选中) y y y 安装结束后,输出以下信息则成功。 ```log =========== = Summary = =========== Driver: Not Selected Toolkit: Installed in /usr/local/cuda-11.4/ Samples: Installed in /home/yiidata/, but missing recommended libraries Please make sure that - PATH includes /usr/local/cuda-11.4/bin - LD_LIBRARY_PATH includes /usr/local/cuda-11.4/lib64, or, add /usr/local/cuda-11.4/lib64 to /etc/ld.so.conf and run ldconfig as root To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-11.4/bin ***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 470.00 is required for CUDA 11.4 functionality to work. To install the driver using this installer, run the following command, replacing with the name of this run file: sudo .run --silent --driver Logfile is /var/log/cuda-installer.log ``` 安装完成后,设置环境变量。 打开主目录下的 `.bashrc`文件添加如下路径,添加环境变量。 ```shell export CUDA_HOME="/usr/local/cuda-11.4" export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$CUDA_HOME/lib64" export PATH="$PATH:$CUDA_HOME/bin" ``` ## cuDNN 安装 这里选cuDNN Library for Linux(Deb安装容易出错) cudnn-linux-x86_64-8.6.0.163_cuda11-archive.tar.xz 下载下来,解压下载的文件,可以看到cuda文件夹,在当前目录打开终端,执行如下命令:(也就是把下载的cudnn文件复制到相应的cuda文件中去) ```shell sudo cp -v include/* /usr/local/cuda/include/ sudo cp -v include/cudnn_version.h /usr/local/cuda/include/ sudo cp -v lib/libcudnn* /usr/local/cuda/lib64/ sudo chmod a+r /usr/local/cuda/include/cudnn.h sudo chmod a+r /usr/local/cuda/lib64/libcudnn* ``` **测试cuda** 终端进入虚拟环境中,用`nvcc --version`检查是否已经安装了cuda 进入python中配置好虚拟环境后测试 ```python import torch from torch.backends import cudnn print(torch.cuda.is_available()) torch.zeros(1).cuda() #上面一行有可能是True但是cuda版本不匹配等原因实际上并没有安装成功,要看这一行报不报错 print(cudnn.is_available()) ``` 可采用命令验证: ```shell python -m torch.utils.collect_env ``` ## 卸载 CUDA 卸载CUDA很简单,一条命令就可以了,主要执行的是CUDA自带的卸载脚本,读者要根据自己的cuda版本找到卸载脚本: ```shell sudo /usr/local/cuda-11.4/bin/cuda-uninstaller ``` 或 Installation Guide Linux :: CUDA Toolkit Documentation (nvidia.com) ```shell sudo apt-get --purge remove "*cublas*" "*cufft*" "*curand*" sudo apt-get --purge remove "*cusolver*" "*cusparse*" "*npp*" "*nvjpeg*" "cuda*" "nsight*" ``` 卸载之后,还有一些残留的文件夹,之前安装的是CUDA 11.4。可以一并删除: ```shell sudo rm -rf /usr/local/cuda-11.4 ```