YOLO模型训练.md 5.9 KB

YOLO 采用命令训练数据集

yolo train data=coco128.yaml model=yolov8n.pt epochs=10 lr0=0.01 device='0,1'

yolo task=detect mode=train model=yolov8x.yaml data=mydata.yaml epochs=10 batch=16

yolo task=segment mode=predict model=yolov8x-seg.pt source='/kaggle/input/personpng/1.jpg'

以上参数解释如下:

  • task:选择任务类型,可选['detect', 'segment', 'classify', 'init']
  • mode: 选择是训练、验证还是预测的任务蕾西 可选['train', 'val', 'predict']
  • model: 选择yolov8不同的模型配置文件,可选yolov8s.yaml、yolov8m.yaml、yolov8l.yaml、yolov8x.yam
  • data: 选择生成的数据集配置文件
  • epochs:指的就是训练过程中整个数据集将被迭代多少次,显卡不行你就调小点。
  • batch:一次看完多少张图片才进行权重更新,梯度下降的mini-batch,显卡不行你就调小点。
  • device: cpu or '0' or '0,1', 采用 cpu or gpu,以及 gpu 编号
  • imgsz: 输入图片大小,显卡不行你就调小点。
  • name: 模型保存的名称

实际运行:

yolo train data=/home/jxft/datasets/hyd-action.yaml model=/home/jxft/datasets/yolov8/yolov8n.pt epochs=100 lr0=0.01
yolo train data=/home/yiidata/datasets/hyd-action.yaml model=/home/yiidata/datasets/yolov8/yolov8n.pt epochs=10 lr0=0.01 device='0'

# yolov7
python train.py --weights=/home/yiidata/datasets/yolov7/yolov7.pt --data=/home/yiidata/datasets/hyd-action.yaml --img-size='640' --epochs=100 --batch-size=1 --device=0
# 导出为 NMS 输出结果为 7 的 onnx 模型
python export.py --weights runs/train/exp/weights/best.pt  --grid --end2end --simplify --max-wh 640 

YOLO 采用代码测试数据集

from ultralytics import YOLO

# Create a new YOLO model from scratch
model = YOLO('yolov8n.yaml')
# Load a pretrained YOLO model (recommended for training)
model = YOLO('yolov8n.pt')

# Train the model using the 'coco128.yaml' dataset for 3 epochs
results = model.train(data='coco128.yaml', epochs=3)
# Evaluate the model's performance on the validation set
results = model.val()
# Perform object detection on an image using the model
results = model('https://ultralytics.com/images/bus.jpg')
# Export the model to ONNX format
success = model.export(format='onnx')

训练参数:

workers = 1
batch = 8
data_name = "TrafficSign"
model = YOLO(abs_path('./weights/yolov5nu.pt', path_type='current'), task='detect')  # 加载预训练的YOLOv8模型
results = model.train(  # 开始训练模型
    data=data_path,  # 指定训练数据的配置文件路径
    device='cpu',  # 指定使用CPU进行训练
    workers=workers,  # 指定使用2个工作进程加载数据
    imgsz=640,  # 指定输入图像的大小为640x640
    epochs=100,  # 指定训练100个epoch
    batch=batch,  # 指定每个批次的大小为8
    name='train_v5_' + data_name  # 指定训练任务的名称
)

Q&A

1. 报错 RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasSgemm

报错日志:

Transferred 319/355 items from pretrained weights
TensorBoard: Start with 'tensorboard --logdir runs/detect/train3', view at http://localhost:6006/
Freezing layer 'model.22.dfl.conv.weight'
AMP: running Automatic Mixed Precision (AMP) checks with YOLOv8n...
Traceback (most recent call last):
  File "/usr/local/bin/yolo", line 8, in <module>
    sys.exit(entrypoint())
  File "/usr/local/python3/lib/python3.8/site-packages/ultralytics/cfg/__init__.py", line 582, in entrypoint
    getattr(model, mode)(**overrides)  # default args from model
  File "/usr/local/python3/lib/python3.8/site-packages/ultralytics/engine/model.py", line 667, in train
    self.trainer.train()
  File "/usr/local/python3/lib/python3.8/site-packages/ultralytics/engine/trainer.py", line 198, in train
    self._do_train(world_size)
  File "/usr/local/python3/lib/python3.8/site-packages/ultralytics/engine/trainer.py", line 312, in _do_train
    self._setup_train(world_size)
  File "/usr/local/python3/lib/python3.8/site-packages/ultralytics/engine/trainer.py", line 256, in _setup_train
    self.amp = torch.tensor(check_amp(self.model), device=self.device)
  File "/usr/local/python3/lib/python3.8/site-packages/ultralytics/utils/checks.py", line 655, in check_amp
    assert amp_allclose(YOLO("yolov8n.pt"), im)
  File "/usr/local/python3/lib/python3.8/site-packages/ultralytics/utils/checks.py", line 642, in amp_allclose
    a = m(im, device=device, verbose=False)[0].boxes.data  # FP32 inference
  File "/usr/local/python3/lib/python3.8/site-packages/ultralytics/engine/model.py", line 176, in __call__
    return self.predict(source, stream, **kwargs)
  File "/usr/local/python3/lib/python3.8/site-packages/ultralytics/engine/model.py", line 444, in predict
    self.predictor.setup_model(model=self.model, verbose=is_cli)
  File "/usr/local/python3/lib/python3.8/site-packages/ultralytics/engine/predictor.py", line 297, in setup_model
    self.model = AutoBackend(
  File "/usr/local/python3/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/python3/lib/python3.8/site-packages/ultralytics/nn/autobackend.py", line 144, in __init__
    model = model.fuse(verbose=verbose)
  File "/usr/local/python3/lib/python3.8/site-packages/ultralytics/nn/tasks.py", line 184, in fuse
    m.conv = fuse_conv_and_bn(m.conv, m.bn)  # update conv
  File "/usr/local/python3/lib/python3.8/site-packages/ultralytics/utils/torch_utils.py", line 196, in fuse_conv_and_bn
    fusedconv.bias.copy_(torch.mm(w_bn, b_conv.reshape(-1, 1)).reshape(-1) + b_bn)
RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`

Linux 系统,删除 LD_LIBRARY_PATH 环境变量即可解决。

unset LD_LIBRARY_PATH

参考