当前位置:   article > 正文










自2020年5月18日发布以来,已经经过过数个版本的迭代,当前最新版本为v7,添加了分割能力。已经有很多的博文讲解了yolov5的原理以及如何用标注的数据,比如YOLOv5网络详解 深入浅出Yolo系列之Yolov5核心基础知识完整讲解  手把手教你用深度学习做物体检测(一): 快速感受物体检测的酷炫


  1. // 克隆代码库即可
  2. git clone https://github.com/ultralytics/yolov5 # clone
  3. cd yolov5
  4. pip install -r requirements.txt # install


  1. import torch
  2. # 加载模型
  3. model = torch.hub.load('ultralytics/yolov5', 'yolov5s') # or yolov5n - yolov5x6, custom
  4. # 图片路径
  5. img = 'https://ultralytics.com/images/zidane.jpg' # or file, Path, PIL, OpenCV, numpy, list
  6. # 执行检测推理
  7. results = model(img)
  8. # 检测结果可视化
  9. results.print() # or .show(), .save(), .crop(), .pandas(), etc.


python detect.py --weights yolov5s.pt --source 0


  1. python detect.py --weights yolov5s.pt --source 0 # webcam
  2. img.jpg # image
  3. vid.mp4 # video
  4. screen # screenshot
  5. path/ # directory
  6. list.txt # list of images
  7. list.streams # list of streams
  8. 'path/*.jpg' # glob
  9. 'https://youtu.be/Zgi9g1ksQHc' # YouTube
  10. 'rtsp://example.com/media.mp4' # RTSP, RTMP, HTTP stream
yolov5 v7.0速度和精度比较



YOLOv5在v6.0版本后相比之前版本有一个很小的改动,把网络的第一层(原来是Focus模块)换成了一个6x6大小的卷积层。两者在理论上其实等价的,但是对于现有的一些GPU设备(以及相应的优化算法)使用6x6大小的卷积层比使用Focus模块更加高效。详情可以参考这个issue #4825。下图是原来的Focus模块(和之前Swin Transformer中的Patch Merging类似),将每个2x2的相邻像素划分为一个patch,然后将每个patch中相同位置(同一颜色)像素给拼在一起就得到了4个feature map,然后在接上一个3x3大小的卷积层。这和直接使用一个6x6大小的卷积层等效。

YOLOv5 6.0之后把Focus替换为等价的6x6的卷积层以方便部署

Neck部分将SPP换成成了SPPFGlenn Jocher自己设计的),两者的作用是一样的,但后者效率更高。SPP结构是将输入并行通过多个不同大小的MaxPool,然后做进一步融合,能在一定程度上解决目标多尺度问题。而SPPF结构是将输入串行通过多个5x5大小的MaxPool层,这里需要注意的是串行两个5x5大小的MaxPool层是和一个9x9大小的MaxPool层计算结果是一样的,串行三个5x5大小的MaxPool层是和一个13x13大小的MaxPool层计算结果是一样的。





Copy paste,将部分目标随机的粘贴到图片中,前提是数据要有segments数据才行,即每个目标的实例分割信息

Copy paste,仅用于分割训练中

Random affine(Rotation, Scale, Translation and Shear),随机进行仿射变换,但根据配置文件里的超参数发现只使用了ScaleTranslation即缩放和平移。





Augment HSV(Hue, Saturation, Value),随机调整色度,饱和度以及明度。


Random horizontal flip,随机水平翻转 



  • Multi-scale training(0.5~1.5x),多尺度训练,假设设置输入图片的大小为 640 × 640 ,训练时采用尺寸是在 0.5 × 640 ∼ 1.5 × 640之间随机取值,注意取值时取得都是32的整数倍(因为网络会最大下采样32倍)。
  • AutoAnchor(For training custom data),训练自己数据集时可以根据自己数据集里的目标进行重新聚类生成Anchors模板。
  • Warmup and Cosine LR scheduler,训练前先进行Warmup热身,然后在采用Cosine学习率下降策略。
  • EMA(Exponential Moving Average),可以理解为给训练的参数加了一个动量,让它更新过程更加平滑。
  • Mixed precision,混合精度训练,能够减少显存的占用并且加快训练速度,前提是GPU硬件支持。
  • Evolve hyper-parameters,超参数优化,没有炼丹经验的人勿碰,保持默认就好。


  • Classes loss,分类损失,采用的是BCE loss,注意只计算正样本的分类损失。
  • Objectness lossobj损失,采用的依然是BCE loss,注意这里的obj指的是网络预测的目标边界框与GT Box的CIoU。这里计算的是所有样本的obj损失。
  • Location loss,定位损失,采用的是CIoU loss,注意只计算正样本的定位损失。


yolov5 v6.0(不含)之前的版本由于使用了Focus层,对部署造成了很大的不变,需要很多复杂的操作,详见详细记录u版YOLOv5目标检测ncnn实现, 具体修改步骤如下目标检测 YOLOv5 转ncnn移动端部署​​​​​​​

  1. // 1.导出onnx
  2. python models/export.py --weights yolov5s.pt --img 320 --batch 1
  3. // 2.简化模型
  4. python -m onnxsim yolov5s.onnx yolov5s-sim.onnx
  5. // 3. 模型转换到ncnn
  6. ./onnx2ncnn yolov5s-sim.onnx yolov5s.param yolov5s.bin
  7. // 4. 编辑 yolov5s.param文件
  8. 4行到13行删除(也就是Slice和Concat层),将第二行由172改成164(一共删除了10层,第二行的173更改为164,计算方法173-(10-1)=164)
  9. 增加自定义层
  10. YoloV5Focus focus 1 1 images 159
  11. 其中159是刚才删除的Concat层的输出
  12. // 5. 支持动态尺寸输入
  13. 将reshape中的96024060更改为-1,或者其他 0=后面的数
  14. // 6. ncnnoptimize优化
  15. ./ncnnoptimize yolov5s.param yolov5s.bin yolov5s-opt.param yolov5s-opt.bin 1

v6.0之后使用6x6的卷积代替,方便多了,可以直接使用opencv的dnn模块进行部署,详见Detecting objects with YOLOv5, OpenCV, Python and C++,代码yolov5-opencv-cpp-python


  1. // 1.加载模型
  2. net = cv2.dnn.readNet('yolov5s.onnx')
  3. // 2.加载图片
  4. def format_yolov5(source):
  5. # put the image in square big enough
  6. col, row, _ = source.shape
  7. _max = max(col, row)
  8. resized = np.zeros((_max, _max, 3), np.uint8)
  9. resized[0:col, 0:row] = source
  10. # resize to 640x640, normalize to [0,1[ and swap Red and Blue channels
  11. result = cv2.dnn.blobFromImage(resized, 1/255.0, (640, 640), swapRB=True)
  12. return result
  13. // 3.执行推理
  14. predictions = net.forward()
  15. output = predictions[0]
  16. // 4.展开结果
  17. def unwrap_detection(input_image, output_data):
  18. class_ids = []
  19. confidences = []
  20. boxes = []
  21. rows = output_data.shape[0]
  22. image_width, image_height, _ = input_image.shape
  23. x_factor = image_width / 640
  24. y_factor = image_height / 640
  25. for r in range(rows):
  26. row = output_data[r]
  27. confidence = row[4]
  28. if confidence >= 0.4:
  29. classes_scores = row[5:]
  30. _, _, _, max_indx = cv2.minMaxLoc(classes_scores)
  31. class_id = max_indx[1]
  32. if (classes_scores[class_id] > .25):
  33. confidences.append(confidence)
  34. class_ids.append(class_id)
  35. x, y, w, h = row[0].item(), row[1].item(), row[2].item(), row[3].item()
  36. left = int((x - 0.5 * w) * x_factor)
  37. top = int((y - 0.5 * h) * y_factor)
  38. width = int(w * x_factor)
  39. height = int(h * y_factor)
  40. box = np.array([left, top, width, height])
  41. boxes.append(box)
  42. // 5.非极大值抑制
  43. indexes = cv2.dnn.NMSBoxes(boxes, confidences, 0.25, 0.45)
  44. result_class_ids = []
  45. result_confidences = []
  46. result_boxes = []
  47. for i in indexes:
  48. result_confidences.append(confidences[i])
  49. result_class_ids.append(class_ids[i])
  50. result_boxes.append(boxes[I])
  51. // 6.可视化结果输出
  52. class_list = []
  53. with open("classes.txt", "r") as f:
  54. class_list = [cname.strip() for cname in f.readlines()]
  55. colors = [(255, 255, 0), (0, 255, 0), (0, 255, 255), (255, 0, 0)]
  56. for i in range(len(result_class_ids)):
  57. box = result_boxes[i]
  58. class_id = result_class_ids[i]
  59. color = colors[class_id % len(colors)]
  60. conf = result_confidences[i]
  61. cv2.rectangle(image, box, color, 2)
  62. cv2.rectangle(image, (box[0], box[1] - 20), (box[0] + box[2], box[1]), color, -1)
  63. cv2.putText(image, class_list[class_id], (box[0] + 5, box[1] - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0,0,0))



基于Lable Studio【yoloV5实战记录】小白也能训练自己的数据集!基于labelimg 手把手教你用深度学习做物体检测(二):数据标注 - 程序员


  1. 类别1 归一化中心点坐标x 归一化中心坐标y 归一化宽度 归一化高度
  2. 类别2 归一化中心点坐标x 归一化中心坐标y 归一化宽度 归一化高度

1. 这里以检测圆为例,详细介绍每个步骤

首先是训练数据的生成和可视化, 随机的以某点为圆心,以60-100为半径画一个颜色随机的圆作为我们要检测的目标,总共生成10万张训练数据

  1. import os
  2. import cv2
  3. import math
  4. import random
  5. import numpy as np
  6. from tqdm import tqdm
  7. def generate():
  8. img = np.zeros((640,640,3),np.uint8)
  9. x = 100+random.randint(0, 400)
  10. y = 100+random.randint(0, 400)
  11. radius = random.randint(60,100)
  12. r = random.randint(0,255)
  13. g = random.randint(0,255)
  14. b = random.randint(0,255)
  15. cv2.circle(img, (x,y), radius, (b,g,r),-1)
  16. return img, [x,y,radius]
  17. def generate_batch(num=10000):
  18. images_dir = "data/circle/images"
  19. if not os.path.exists(images_dir):
  20. os.makedirs(images_dir)
  21. labels_dir = "data/circle/labels"
  22. if not os.path.exists(labels_dir):
  23. os.makedirs(labels_dir)
  24. for i in tqdm(range(num)):
  25. img, labels = generate()
  26. cv2.imwrite(images_dir+"/"+str(i)+".jpg", img)
  27. with open(labels_dir+"/"+str(i)+".txt", 'w') as f:
  28. x, y, radius = labels
  29. f.write("0 "+str(x/640)+" "+str(y/640)+" "+str(2*radius/640)+" "+str(2*radius/640)+"\n")
  30. def show_gt(dir='data/circle'):
  31. files = os.listdir(dir+"/images")
  32. gtdir = dir+"/gt"
  33. if not os.path.exists(gtdir):
  34. os.makedirs(gtdir)
  35. for file in tqdm(files):
  36. imgpath = dir+"/images/"+file
  37. img = cv2.imread(imgpath)
  38. h,w,_ = img.shape
  39. labelpath = dir+"/labels/"+file[:-3]+"txt"
  40. with open(labelpath) as f:
  41. lines = f.readlines()
  42. for line in lines:
  43. items = line[:-1].split(" ")
  44. c = int(items[0])
  45. cx = float(items[1])
  46. cy = float(items[2])
  47. cw = float(items[3])
  48. ch = float(items[4])
  49. x1 = int((cx - cw/2)*w)
  50. y1 = int((cy - ch/2)*h)
  51. x2 = int((cx + cw/2)*w)
  52. y2 = int((cy + ch/2)*h)
  53. cv2.rectangle(img, (x1,y1),(x2,y2),(0,255,0),2)
  54. cv2.imwrite(gtdir+"/"+file, img)
  55. if __name__=="__main__":
  56. generate_batch()
  57. show_gt()


  1. train: data/circle/images/
  2. val: data/circle/images/
  3. # number of classes
  4. nc: 1
  5. # class names
  6. names: ['circle']


  1. import os
  2. import cv2
  3. import math
  4. import random
  5. import numpy as np
  6. from tqdm import tqdm
  7. def generate_circle():
  8. img = np.zeros((640,640,3),np.uint8)
  9. x = 100+random.randint(0, 400)
  10. y = 100+random.randint(0, 400)
  11. radius = random.randint(60,100)
  12. r = random.randint(0,255)
  13. g = random.randint(0,255)
  14. b = random.randint(0,255)
  15. cv2.circle(img, (x,y), radius, (b,g,r),-1)
  16. return img, [x,y,radius*2,radius*2]
  17. def generate_rectangle():
  18. img = np.zeros((640,640,3),np.uint8)
  19. x1 = 100+random.randint(0, 400)
  20. y1 = 100+random.randint(0, 400)
  21. w = random.randint(80, 200)
  22. h = random.randint(80, 200)
  23. x2 = x1 + w
  24. y2 = y1 + h
  25. r = random.randint(0,255)
  26. g = random.randint(0,255)
  27. b = random.randint(0,255)
  28. cx = (x1+x2)//2
  29. cy = (y1+y2)//2
  30. cv2.rectangle(img, (x1,y1), (x2,y2), (b,g,r),-1)
  31. return img, [cx,cy,w,h]
  32. def generate_batch(num=100000):
  33. images_dir = "data/shape/images"
  34. if not os.path.exists(images_dir):
  35. os.makedirs(images_dir)
  36. labels_dir = "data/shape/labels"
  37. if not os.path.exists(labels_dir):
  38. os.makedirs(labels_dir)
  39. for i in tqdm(range(num)):
  40. if i % 2 == 0:
  41. img, labels = generate_circle()
  42. else:
  43. img, labels = generate_rectangle()
  44. cv2.imwrite(images_dir+"/"+str(i)+".jpg", img)
  45. with open("data/shape/labels/"+str(i)+".txt", 'w') as f:
  46. cx,cy,w,h = labels
  47. f.write(str(i%2)+" "+str(cx/640)+" "+str(cy/640)+" "+str(w/640)+" "+str(h/640)+"\n")
  48. def show_gt(dir='data/shape'):
  49. files = os.listdir(dir+"/images")
  50. gtdir = dir+"/gt"
  51. if not os.path.exists(gtdir):
  52. os.makedirs(gtdir)
  53. for file in tqdm(files):
  54. imgpath = dir+"/images/"+file
  55. img = cv2.imread(imgpath)
  56. h, w, _ = img.shape
  57. labelpath = dir+"/labels/"+file[:-3]+"txt"
  58. with open(labelpath) as f:
  59. lines = f.readlines()
  60. for line in lines:
  61. items = line[:-1].split(" ")
  62. c = int(items[0])
  63. cx = float(items[1])
  64. cy = float(items[2])
  65. cw = float(items[3])
  66. ch = float(items[4])
  67. x1 = int((cx - cw/2)*w)
  68. y1 = int((cy - ch/2)*h)
  69. x2 = int((cx + cw/2)*w)
  70. y2 = int((cy + ch/2)*h)
  71. cv2.rectangle(img, (x1,y1),(x2,y2),(0,255,0),2)
  72. cv2.putText(img, str(c), (x1,y1), 3,1,(0,0,255))
  73. cv2.imwrite(gtdir+"/"+file, img)
  74. if __name__=="__main__":
  75. generate_batch()
  76. show_gt()

 对应的shape.yaml, 注意类别数是2

  1. train: data/shape/images/
  2. val: data/shape/images/
  3. # number of classes
  4. nc: 2
  5. # class names
  6. names: ['circle', 'rectangle']



python train.py --data circle.yaml --cfg yolov5s.yaml --weights '' --batch-size 64

如果是圆形和长方形两类目标, 命令为

python train.py --data shape.yaml --cfg yolov5s.yaml --weights '' --batch-size 64





  1. epoch, train/box_loss, train/obj_loss, train/cls_loss, metrics/precision, metrics/recall, metrics/mAP_0.5,metrics/mAP_0.5:0.95, val/box_loss, val/obj_loss, val/cls_loss, x/lr0, x/lr1, x/lr2
  2. 0, 0.03892, 0.011817, 0, 0.99998, 0.99978, 0.995, 0.92987, 0.0077891, 0.0030948, 0, 0.0033312, 0.0033312, 0.070019
  3. 1, 0.017302, 0.0049876, 0, 1, 0.9999, 0.995, 0.99105, 0.0031843, 0.0015662, 0, 0.0066644, 0.0066644, 0.040019
  4. 2, 0.011272, 0.0034826, 0, 1, 0.99994, 0.995, 0.99499, 0.0020194, 0.0010969, 0, 0.0099969, 0.0099969, 0.010018
  5. 3, 0.0080153, 0.0027186, 0, 1, 0.99994, 0.995, 0.995, 0.0013095, 0.00083033, 0, 0.0099978, 0.0099978, 0.0099978
  6. 4, 0.0067639, 0.0023831, 0, 1, 0.99996, 0.995, 0.995, 0.00099513, 0.00068878, 0, 0.0099978, 0.0099978, 0.0099978
  7. 5, 0.0061637, 0.0022279, 0, 1, 0.99996, 0.995, 0.995, 0.00090497, 0.00064193, 0, 0.0099961, 0.0099961, 0.0099961
  8. 6, 0.0058844, 0.002144, 0, 0.99999, 0.99998, 0.995, 0.995, 0.0009117, 0.00063328, 0, 0.0099938, 0.0099938, 0.0099938
  9. 7, 0.0056247, 0.00208, 0, 0.99999, 0.99999, 0.995, 0.995, 0.00086355, 0.00061343, 0, 0.0099911, 0.0099911, 0.0099911
  10. 8, 0.0054567, 0.0020223, 0, 1, 0.99999, 0.995, 0.995, 0.00081632, 0.00059592, 0, 0.0099879, 0.0099879, 0.0099879
  11. 9, 0.0053597, 0.0019864, 0, 1, 1, 0.995, 0.995, 0.00081379, 0.00058942, 0, 0.0099842, 0.0099842, 0.0099842
  12. 10, 0.0053103, 0.0019559, 0, 1, 1, 0.995, 0.995, 0.0008175, 0.00058669, 0, 0.00998, 0.00998, 0.00998
  13. 11, 0.0052146, 0.0019445, 0, 1, 1, 0.995, 0.995, 0.00083248, 0.00058731, 0, 0.0099753, 0.0099753, 0.0099753
  14. 12, 0.0050852, 0.0019065, 0, 1, 1, 0.995, 0.995, 0.00085092, 0.00058853, 0, 0.0099702, 0.0099702, 0.0099702
  15. 13, 0.0050589, 0.0019031, 0, 1, 1, 0.995, 0.995, 0.00086915, 0.00059267, 0, 0.0099645, 0.0099645, 0.0099645
  16. 14, 0.0049664, 0.0018693, 0, 1, 1, 0.995, 0.995, 0.00090856, 0.00059815, 0, 0.0099584, 0.0099584, 0.0099584
  17. 15, 0.0049839, 0.0018568, 0, 1, 1, 0.995, 0.995, 0.00093147, 0.00060425, 0, 0.0099517, 0.0099517, 0.0099517
  18. 16, 0.0049079, 0.0018459, 0, 1, 1, 0.995, 0.995, 0.0009656, 0.00061124, 0, 0.0099446, 0.0099446, 0.0099446
  19. 17, 0.0048693, 0.0018277, 0, 1, 1, 0.995, 0.995, 0.00099703, 0.00061948, 0, 0.009937, 0.009937, 0.009937
  20. 18, 0.0048052, 0.0018103, 0, 1, 1, 0.995, 0.995, 0.0010246, 0.00062618, 0, 0.0099289, 0.0099289, 0.0099289
  21. 19, 0.0047608, 0.0017947, 0, 1, 1, 0.995, 0.995, 0.0010439, 0.00063123, 0, 0.0099203, 0.0099203, 0.0099203








python detect.py --weights exps/yolov5s_circle/weights/best.pt --source data/circle/images



  1. import cv2
  2. import numpy as np
  3. import torch
  4. from torchvision import transforms
  5. import onnxruntime
  6. from utils.general import non_max_suppression
  7. def detect(img, ort_session):
  8. img = img.astype(np.float32)
  9. img = img / 255
  10. img_tensor = img.transpose(2,0,1)[None]
  11. ort_inputs = {ort_session.get_inputs()[0].name: img_tensor}
  12. pred = torch.tensor(ort_session.run(None, ort_inputs)[0])
  13. dets = non_max_suppression(pred, 0.25, 0.45)
  14. return dets[0]
  15. def demo():
  16. ort_session = onnxruntime.InferenceSession("yolov5s.onnx", providers=['TensorrtExecutionProvider'])
  17. img = cv2.imread("data/images/bus.jpg")
  18. img = cv2.resize(img,(640,640))
  19. dets = detect(img, ort_session)
  20. for det in dets:
  21. x1 = int(det[0])
  22. y1 = int(det[1])
  23. x2 = int(det[2])
  24. y2 = int(det[3])
  25. score = float(det[4])
  26. cls = int(det[5])
  27. info = "{}_{:.2f}".format(cls, score*100)
  28. cv2.rectangle(img, (x1,y1),(x2,y2),(255,255,0))
  29. cv2.putText(img, info, (x1,y1), 1, 1, (0,0,255))
  30. cv2.imwrite("runs/detect/bus.jpg", img)
  31. if __name__=="__main__":
  32. demo()



