Browse Source

完善文档,验证环境

zhzhenqin 5 months ago
parent
commit
55b606ebae
2 changed files with 79 additions and 8 deletions
  1. 21 7
      docs/N卡驱动及CUDA安装.md
  2. 58 1
      docs/YOLO环境安装.md

+ 21 - 7
docs/N卡驱动及CUDA安装.md

@@ -57,14 +57,20 @@ https://developer.nvidia.com/rdp/cudnn-archive
 ```shell
 sudo sh cuda_11.4.0_470.57.02_linux.run
 ```
-期间
-一路回车
+
+安装期间
+接受授权协议
 accept
-n(不要安装driver,已有驱动)
+一路回车,或选择 OK。
+
+在选择安装项目列表时,
+n(不要安装driver,已有驱动, 空格可以取消选中)
 y
 y
 y
 
+安装结束后,输出以下信息则成功。
+
 ```log
 ===========
 = Summary =
@@ -88,7 +94,7 @@ Logfile is /var/log/cuda-installer.log
 
 安装完成后,设置环境变量。
 
-打开主目录下的 .bashrc文件添加如下路径,例如我的.bashrc文件在/home/wangyuanwei下,如果没有找到,则按Ctrl+H键显示隐藏文件
+打开主目录下的 `.bashrc`文件添加如下路径,添加环境变量
 
 ```shell
 export CUDA_HOME="/usr/local/cuda-11.4"
@@ -100,6 +106,8 @@ export PATH="$PATH:$CUDA_HOME/bin"
 
 这里选cuDNN Library for Linux(Deb安装容易出错)
 
+cudnn-linux-x86_64-8.6.0.163_cuda11-archive.tar.xz
+
 下载下来,解压下载的文件,可以看到cuda文件夹,在当前目录打开终端,执行如下命令:(也就是把下载的cudnn文件复制到相应的cuda文件中去)
 
 ```shell
@@ -125,12 +133,18 @@ torch.zeros(1).cuda() #上面一行有可能是True但是cuda版本不匹配等
 print(cudnn.is_available())
 ```
 
+可采用命令验证:
+
+```shell
+python -m torch.utils.collect_env
+```
+
 ## 卸载 CUDA
 
 卸载CUDA很简单,一条命令就可以了,主要执行的是CUDA自带的卸载脚本,读者要根据自己的cuda版本找到卸载脚本:
 
 ```shell
-sudo /usr/local/cuda-10.0/bin/uninstall_cuda_10.0.pl
+sudo /usr/local/cuda-11.4/bin/cuda-uninstaller
 ```
 
@@ -142,8 +156,8 @@ sudo apt-get --purge remove "*cublas*" "*cufft*" "*curand*"
 sudo apt-get --purge remove "*cusolver*" "*cusparse*" "*npp*" "*nvjpeg*" "cuda*" "nsight*" 
 ```
 
-卸载之后,还有一些残留的文件夹,之前安装的是CUDA 10.0。可以一并删除:
+卸载之后,还有一些残留的文件夹,之前安装的是CUDA 11.4。可以一并删除:
 
 ```shell
-sudo rm -rf /usr/local/cuda-10.0/
+sudo rm -rf /usr/local/cuda-11.4
 ```

+ 58 - 1
docs/YOLO环境安装.md

@@ -19,4 +19,61 @@ pip install torch==1.13.1 torchvision==0.14.1 ultralytics==8.1.42 torchaudio==0.
 pip install ultralytics
 ```
 
-因为没有显卡,所以无需安装 torch-cuda 以及 onnx-gpu 版本。
+因为没有显卡,所以无需安装 onnx-gpu 版本。
+
+采用 CUDA 安装:
+
+```shell
+pip install onnx==1.16.1 onnxruntime-gpu==1.16.3
+pip install tensorboard==2.8.0 tensorflow-gpu==2.8.4
+pip install nvidia-cuda-nvrtc-cu11==11.7.99
+pip install nvidia-cuda-runtime-cu11==11.7.99
+```
+
+
+## 环境验证
+
+```shell
+python -m torch.utils.collect_env
+```
+
+输出环境信息:
+
+```log
+Collecting environment information...
+PyTorch version: 1.13.1+cu117
+Is debug build: False
+CUDA used to build PyTorch: 11.7
+ROCM used to build PyTorch: N/A
+
+OS: Ubuntu 20.04.6 LTS (x86_64)
+GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
+Clang version: Could not collect
+CMake version: Could not collect
+Libc version: glibc-2.31
+
+Python version: 3.8.19 (default, Aug  7 2024, 15:36:56)  [GCC 9.4.0] (64-bit runtime)
+Python platform: Linux-5.15.0-107-generic-x86_64-with-glibc2.29
+Is CUDA available: True
+CUDA runtime version: 11.4.152
+CUDA_MODULE_LOADING set to: LAZY
+GPU models and configuration: GPU 0: NVIDIA GeForce GTX 1060 3GB
+Nvidia driver version: 550.100
+cuDNN version: Probably one of the following:
+/usr/local/cuda-11.4/targets/x86_64-linux/lib/libcudnn.so.8
+/usr/local/cuda-11.4/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8
+/usr/local/cuda-11.4/targets/x86_64-linux/lib/libcudnn_adv_train.so.8
+/usr/local/cuda-11.4/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8
+/usr/local/cuda-11.4/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8
+/usr/local/cuda-11.4/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8
+/usr/local/cuda-11.4/targets/x86_64-linux/lib/libcudnn_ops_train.so.8
+HIP runtime version: N/A
+MIOpen runtime version: N/A
+Is XNNPACK available: True
+
+Versions of relevant libraries:
+[pip3] numpy==1.24.4
+[pip3] torch==1.13.1
+[pip3] torchvision==0.14.1
+[conda] Could not collect
+```