查看显卡型号
lspci | grep -i nvidia
03:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1060 3GB] (rev a1)
03:00.1 Audio device: NVIDIA Corporation GP104 High Definition Audio Controller (rev a1)
然后到 Nvidia 官网下载驱动:https://www.nvidia.cn/drivers/lookup/
开始安装
sudo ./NVIDIA-1060-Linux-x86_64-550.100.run
安装完成后,重启电脑。输入:nvidia-smi,查看驱动是否安装成功。
===========
= Summary =
===========
Driver: Not Selected
Toolkit: Installed in /usr/local/cuda-11.4/
Samples: Installed in /home/yiidata/, but missing recommended libraries
Please make sure that
- PATH includes /usr/local/cuda-11.4/bin
- LD_LIBRARY_PATH includes /usr/local/cuda-11.4/lib64, or, add /usr/local/cuda-11.4/lib64 to /etc/ld.so.conf and run ldconfig as root
To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-11.4/bin
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 470.00 is required for CUDA 11.4 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
sudo <CudaInstaller>.run --silent --driver
Logfile is /var/log/cuda-installer.log
安装完成后,设置环境变量。
打开主目录下的 .bashrc文件添加如下路径,例如我的.bashrc文件在/home/wangyuanwei下,如果没有找到,则按Ctrl+H键显示隐藏文件。
export CUDA_HOME="/usr/local/cuda-11.4"
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$CUDA_HOME/lib64"
export PATH="$PATH:$CUDA_HOME/bin"
这里选cuDNN Library for Linux(Deb安装容易出错)
下载下来,解压下载的文件,可以看到cuda文件夹,在当前目录打开终端,执行如下命令:(也就是把下载的cudnn文件复制到相应的cuda文件中去)
sudo cp -v include/* /usr/local/cuda/include/
sudo cp -v include/cudnn_version.h /usr/local/cuda/include/
sudo cp -v lib/libcudnn* /usr/local/cuda/lib64/
sudo chmod a+r /usr/local/cuda/include/cudnn.h
sudo chmod a+r /usr/local/cuda/lib64/libcudnn*
测试cuda
终端进入虚拟环境中,用nvcc --version
检查是否已经安装了cuda
进入python中配置好虚拟环境后测试
import torch
from torch.backends import cudnn
print(torch.cuda.is_available())
torch.zeros(1).cuda() #上面一行有可能是True但是cuda版本不匹配等原因实际上并没有安装成功,要看这一行报不报错
print(cudnn.is_available())
卸载CUDA很简单,一条命令就可以了,主要执行的是CUDA自带的卸载脚本,读者要根据自己的cuda版本找到卸载脚本:
sudo /usr/local/cuda-10.0/bin/uninstall_cuda_10.0.pl
或
Installation Guide Linux :: CUDA Toolkit Documentation (nvidia.com)
sudo apt-get --purge remove "*cublas*" "*cufft*" "*curand*"
sudo apt-get --purge remove "*cusolver*" "*cusparse*" "*npp*" "*nvjpeg*" "cuda*" "nsight*"
卸载之后,还有一些残留的文件夹,之前安装的是CUDA 10.0。可以一并删除:
sudo rm -rf /usr/local/cuda-10.0/