Ubuntu and Debian 系统所需要的库:
sudo apt-update
sudo apt install net-tools
sudo apt install vim wget curl git
sudo apt install gcc make build-essential
sudo apt install -y libssl-dev zlib1g-dev
sudo apt install -y libbz2-dev libreadline-dev libsqlite3-dev llvm
sudo apt install -y libncurses5-dev libncursesw5-dev xz-utils tk-dev libffi-dev liblzma-dev
sudo apt install -y libbz2-dev libssl-dev libncurses5-dev libsqlite3-dev libreadline-dev
sudo apt install -y tk-dev libgdbm-dev libdb-dev libpcap-dev xz-utils libexpat1-dev liblzma-dev libffi-dev libc6-dev
查看显卡型号
lspci | grep -i nvidia
03:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1060 3GB] (rev a1)
03:00.1 Audio device: NVIDIA Corporation GP104 High Definition Audio Controller (rev a1)
然后到 Nvidia 官网下载驱动:https://www.nvidia.cn/drivers/lookup/
开始安装
sudo ./NVIDIA-1060-Linux-x86_64-550.100.run
安装完成后,重启电脑。输入:nvidia-smi,查看驱动是否安装成功。
分别到Nvidia 官网下载驱动和 CUDA 工具包。
CUDA https://developer.nvidia.com/cuda-toolkit-archive
cuDNN https://developer.nvidia.com/rdp/cudnn-archive
下载后,上传到制定目录。执行安装:
sudo sh cuda_11.4.0_470.57.02_linux.run
安装期间 接受授权协议 accept 一路回车,或选择 OK。
在选择安装项目列表时, n(不要安装driver,已有驱动, 空格可以取消选中) y y y
安装结束后,输出以下信息则成功。
===========
= Summary =
===========
Driver: Not Selected
Toolkit: Installed in /usr/local/cuda-11.4/
Samples: Installed in /home/yiidata/, but missing recommended libraries
Please make sure that
- PATH includes /usr/local/cuda-11.4/bin
- LD_LIBRARY_PATH includes /usr/local/cuda-11.4/lib64, or, add /usr/local/cuda-11.4/lib64 to /etc/ld.so.conf and run ldconfig as root
To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-11.4/bin
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 470.00 is required for CUDA 11.4 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
sudo <CudaInstaller>.run --silent --driver
Logfile is /var/log/cuda-installer.log
安装完成后,设置环境变量。
打开主目录下的 .bashrc
文件添加如下路径,添加环境变量。
export CUDA_HOME="/usr/local/cuda-11.4"
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$CUDA_HOME/lib64"
export PATH="$PATH:$CUDA_HOME/bin"
这里选cuDNN Library for Linux(Deb安装容易出错)
cudnn-linux-x86_64-8.6.0.163_cuda11-archive.tar.xz
下载下来,解压下载的文件,可以看到cuda文件夹,在当前目录打开终端,执行如下命令:(也就是把下载的cudnn文件复制到相应的cuda文件中去)
sudo cp -v include/* /usr/local/cuda/include/
sudo cp -v include/cudnn_version.h /usr/local/cuda/include/
sudo cp -v lib/libcudnn* /usr/local/cuda/lib64/
sudo chmod a+r /usr/local/cuda/include/cudnn.h
sudo chmod a+r /usr/local/cuda/lib64/libcudnn*
测试cuda
终端进入虚拟环境中,用nvcc --version
检查是否已经安装了cuda
进入python中配置好虚拟环境后测试
import torch
from torch.backends import cudnn
print(torch.cuda.is_available())
torch.zeros(1).cuda() #上面一行有可能是True但是cuda版本不匹配等原因实际上并没有安装成功,要看这一行报不报错
print(cudnn.is_available())
可采用命令验证:
python -m torch.utils.collect_env
卸载CUDA很简单,一条命令就可以了,主要执行的是CUDA自带的卸载脚本,读者要根据自己的cuda版本找到卸载脚本:
sudo /usr/local/cuda-11.4/bin/cuda-uninstaller
或
Installation Guide Linux :: CUDA Toolkit Documentation (nvidia.com)
sudo apt-get --purge remove "*cublas*" "*cufft*" "*curand*"
sudo apt-get --purge remove "*cusolver*" "*cusparse*" "*npp*" "*nvjpeg*" "cuda*" "nsight*"
卸载之后,还有一些残留的文件夹,之前安装的是CUDA 11.4。可以一并删除:
sudo rm -rf /usr/local/cuda-11.4