当前位置:   article > 正文

3D视觉——1.人体姿态估计(Pose Estimation)入门——使用MediaPipe含单帧(Signel Frame)与实时视频(Real-Time Video)_mediapipe人体姿态估计

mediapipe人体姿态估计

使用MediaPipe工具包进行开发

什么是MediaPipe?

MediaPipe是一款由Google Research 开发并开源的多媒体机器学习模型应用框架,用于处理视频、音频等时间序列数据。这个跨平台架构使用于桌面/服务器、Android、iOS和嵌入式设备等。

我们使用MeidaPipe下的Solutions(方案特定的模型),共有16个Solutions:

  1. 人脸检测
  2. Fase Mesh (人脸上打了特别多网格)
  3. 虹膜(人眼)
  4. 姿态(!这章博客需要用到的)
  5. 人体
  6. 人物分割
  7. 头发分割
  8. 目标检测
  9. Box Tracking
  10. 实例动态跟踪
  11. 3D目标检测
  12. 特征匹配
  13. AutoFlip
  14. MediaSequence
  15. YouTube-8M

人体姿态估计代码

重点代码讲解

1.solutions.pose

  1. import mediapipe as mp
  2. mp_pose = mp.solutions.pose

mediapipe姿态估计模块(.solutions.pose),将人体各个部位分成33个点(0-32)如下图1.

图1.

通常可以通过判断角度,来判断姿态是什么动作。如下图2. (具体动作识别判断:采集不同动作下的图片,然后通过姿态检测,根据角度对图像进行标注,将大量图像作为训练集进行学习,完成姿态行为识别判断。)

 图2.


2.mp_pose.Pose()其参数:

static_image_mode 表示 静态图像还是连续帧视频

model_complexity 表示 人体姿态估计模型; 0 表示 速度最快,精度最低(三者之中);1 表示 速度中间,精度中间(三者之中);2 表示 速度最慢,精度最高(三者之中);

smooth_landmarks 表示 是否平滑关键点;

enable_segmentation 表示 是否对人体进行抠图;

min_detection_confidence 表示 检测置信度阈值;

min_tracking_confidence 表示 各帧之间跟踪置信度阈值;

  1. pose = mp_pose.Pose(static_image_mode=True,
  2. model_complexity=1,
  3. smooth_landmarks=True,
  4. enable_segmentation=True,
  5. min_detection_confidence=0.5,
  6. min_tracking_confidence=0.5)

3.solutions.drawing_utils

绘图(可以绘制3D坐标,也可以直接在原图上绘制关键点,姿态)

  1. drawing = mp.solutions.drawing_utils
  2. ...
  3. drawing.draw_landmarks(img, results.pose_landmarks, mp_pose.POSE_CONNECTIONS)(原图基础上显示关键点,姿态)
  4. drawing.plot_landmarks(results.pose_world_landmarks, mp_pose.POSE_CONNECTIONS)(3D)

之后的代码就是opencv相关


完整代码

  1. import cv2
  2. import mediapipe as mp
  3. if __name__ == '__main__':
  4. mp_pose = mp.solutions.pose
  5. pose = mp_pose.Pose(static_image_mode=True,
  6. model_complexity=1,
  7. smooth_landmarks=True,
  8. # enable_segmentation=True,
  9. min_detection_confidence=0.5,
  10. min_tracking_confidence=0.5)
  11. drawing = mp.solutions.drawing_utils
  12. # read img BGR to RGB
  13. img = cv2.imread("1.jpg")
  14. img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
  15. cv2.imshow("input", img)
  16. results = pose.process(img)
  17. drawing.draw_landmarks(img, results.pose_landmarks, mp_pose.POSE_CONNECTIONS)
  18. cv2.imshow("keypoint", img)
  19. drawing.plot_landmarks(results.pose_world_landmarks, mp_pose.POSE_CONNECTIONS)
  20. cv2.waitKey(0)
  21. cv2.destroyAllWindows()

运行结果

原图

 原图基础上显示关键点

3D坐标


坐标解析

还是拿这个正在跑步的人举例。

代码

  1. import cv2
  2. import mediapipe as mp
  3. import matplotlib.pyplot as plt
  4. mp_pose = mp.solutions.pose
  5. pose = mp_pose.Pose(static_image_mode=False,
  6. model_complexity=0,
  7. smooth_landmarks=True,
  8. enable_segmentation=True,
  9. min_detection_confidence=0.5,
  10. min_tracking_confidence=0.5)
  11. drawing = mp.solutions.drawing_utils
  12. if __name__ == '__main__':
  13. img = cv2.imread('1.jpg')
  14. results = pose.process(img)
  15. print(results.pose_landmarks)
# results.pose_landmarks

其中 所有关键点的检测结果可以从 results.pose_landmarks 看到如下;这是归一化之后的结果,若要还原,x得乘以宽度得到像素坐标;y得乘以高度得到像素坐标;z坐标的原点在人体左右两个髋关节的中点,方向如果离越近为负值,越远为正值。

  1. landmark {
  2. x: 0.5913416743278503
  3. y: 0.17020824551582336
  4. z: -0.10148811340332031
  5. visibility: 0.9999819993972778
  6. }
  7. landmark {
  8. x: 0.5996149778366089
  9. y: 0.15227100253105164
  10. z: -0.11516246944665909
  11. visibility: 0.9999804496765137
  12. }
  13. landmark {
  14. x: 0.6029789447784424
  15. y: 0.15230441093444824
  16. z: -0.1152396947145462
  17. visibility: 0.9999819993972778
  18. }
  19. landmark {
  20. x: 0.6068712472915649
  21. y: 0.15244033932685852
  22. z: -0.11535673588514328
  23. visibility: 0.9999862909317017
  24. }
  25. landmark {
  26. x: 0.5956720113754272
  27. y: 0.15167823433876038
  28. z: -0.07190258055925369
  29. visibility: 0.9999619722366333
  30. }
  31. landmark {
  32. x: 0.5958214998245239
  33. y: 0.1511225700378418
  34. z: -0.07187224179506302
  35. visibility: 0.9999505281448364
  36. }
  37. landmark {
  38. x: 0.5961448550224304
  39. y: 0.15046393871307373
  40. z: -0.07175181061029434
  41. visibility: 0.9999566078186035
  42. }
  43. landmark {
  44. x: 0.6275932788848877
  45. y: 0.16257867217063904
  46. z: -0.12434940785169601
  47. visibility: 0.9999855756759644
  48. }
  49. landmark {
  50. x: 0.612525463104248
  51. y: 0.15917572379112244
  52. z: 0.07216572016477585
  53. visibility: 0.9999306201934814
  54. }
  55. landmark {
  56. x: 0.5972875952720642
  57. y: 0.1862889975309372
  58. z: -0.10227096825838089
  59. visibility: 0.9999692440032959
  60. }
  61. landmark {
  62. x: 0.592987596988678
  63. y: 0.18590152263641357
  64. z: -0.04587363451719284
  65. visibility: 0.9999159574508667
  66. }
  67. landmark {
  68. x: 0.6709297895431519
  69. y: 0.25625985860824585
  70. z: -0.19476133584976196
  71. visibility: 0.9999887943267822
  72. }
  73. landmark {
  74. x: 0.6155267357826233
  75. y: 0.27312740683555603
  76. z: 0.23764272034168243
  77. visibility: 0.9998886585235596
  78. }
  79. landmark {
  80. x: 0.76192706823349
  81. y: 0.32696548104286194
  82. z: -0.23866404592990875
  83. visibility: 0.9991742968559265
  84. }
  85. landmark {
  86. x: 0.6149069666862488
  87. y: 0.37373778223991394
  88. z: 0.3292929530143738
  89. visibility: 0.2991478443145752
  90. }
  91. landmark {
  92. x: 0.7010799646377563
  93. y: 0.4162237048149109
  94. z: -0.18799468874931335
  95. visibility: 0.9904139637947083
  96. }
  97. landmark {
  98. x: 0.5629619359970093
  99. y: 0.34696149826049805
  100. z: 0.2674705684185028
  101. visibility: 0.40632331371307373
  102. }
  103. landmark {
  104. x: 0.6892314553260803
  105. y: 0.43785160779953003
  106. z: -0.21043820679187775
  107. visibility: 0.9691246151924133
  108. }
  109. landmark {
  110. x: 0.5501535534858704
  111. y: 0.334354966878891
  112. z: 0.26719772815704346
  113. visibility: 0.36899450421333313
  114. }
  115. landmark {
  116. x: 0.6795801520347595
  117. y: 0.4296255111694336
  118. z: -0.19730502367019653
  119. visibility: 0.9676418304443359
  120. }
  121. landmark {
  122. x: 0.5508479475975037
  123. y: 0.3242868185043335
  124. z: 0.23829618096351624
  125. visibility: 0.37453970313072205
  126. }
  127. landmark {
  128. x: 0.6808692216873169
  129. y: 0.4231947660446167
  130. z: -0.17752741277217865
  131. visibility: 0.9631088972091675
  132. }
  133. landmark {
  134. x: 0.555029034614563
  135. y: 0.3278791904449463
  136. z: 0.2512615919113159
  137. visibility: 0.3893350064754486
  138. }
  139. landmark {
  140. x: 0.6576938033103943
  141. y: 0.5196953415870667
  142. z: -0.14214162528514862
  143. visibility: 0.9999549388885498
  144. }
  145. landmark {
  146. x: 0.6405556797981262
  147. y: 0.5202372074127197
  148. z: 0.14222070574760437
  149. visibility: 0.9999477863311768
  150. }
  151. landmark {
  152. x: 0.5241203904151917
  153. y: 0.6201881766319275
  154. z: -0.15834815800189972
  155. visibility: 0.9693808555603027
  156. }
  157. landmark {
  158. x: 0.7318142056465149
  159. y: 0.6902449727058411
  160. z: 0.11025446653366089
  161. visibility: 0.9495575428009033
  162. }
  163. landmark {
  164. x: 0.5604070425033569
  165. y: 0.815612256526947
  166. z: -0.03564663231372833
  167. visibility: 0.9501809477806091
  168. }
  169. landmark {
  170. x: 0.8767399191856384
  171. y: 0.8223288655281067
  172. z: 0.1430264711380005
  173. visibility: 0.9820705652236938
  174. }
  175. landmark {
  176. x: 0.5801612138748169
  177. y: 0.8386951684951782
  178. z: -0.026119956746697426
  179. visibility: 0.9103515148162842
  180. }
  181. landmark {
  182. x: 0.8959819078445435
  183. y: 0.875182569026947
  184. z: 0.14569874107837677
  185. visibility: 0.9787982106208801
  186. }
  187. landmark {
  188. x: 0.5071742534637451
  189. y: 0.875374436378479
  190. z: -0.06918345391750336
  191. visibility: 0.9140819907188416
  192. }
  193. landmark {
  194. x: 0.88722825050354
  195. y: 0.9008339643478394
  196. z: 0.09929685294628143
  197. visibility: 0.9545168280601501
  198. }

调用

  1. print(mp_pose.POSE_CONNECTIONS)
  2. # mp_pose.POSE_CONNECTIONS

mp_pose.POSE_CONNECTIONS表示人体33个关键点如何连接的骨架

  1. frozenset({(15, 21),
  2. (16, 20),
  3. (18, 20),
  4. (3, 7),
  5. (14, 16),
  6. (23, 25),
  7. (28, 30),
  8. (11, 23),
  9. (27, 31),
  10. (6, 8),
  11. (15, 17),
  12. (24, 26),
  13. (16, 22),
  14. (4, 5),
  15. (5, 6),
  16. (29, 31),
  17. (12, 24),
  18. (23, 24),
  19. (0, 1),
  20. (9, 10),
  21. (1, 2),
  22. (0, 4),
  23. (11, 13),
  24. (30, 32),
  25. (28, 32),
  26. (15, 19),
  27. (16, 18),
  28. (25, 27),
  29. (26, 28),
  30. (12, 14),
  31. (17, 19),
  32. (2, 3),
  33. (11, 12),
  34. (27, 29),
  35. (13, 15)})

调用

  1. print(results.pose_landmarks.landmark[mp_pose.PoseLandmark.LEFT_EYE])
  2. print(results.pose_landmarks.landmark[2])
  3. # results.pose_landmarks.landmark[mp_pose.PoseLandmark.LEFT_EYE]
  4. # or
  5. # results.pose_landmarks.landmark[2]

查看左眼属性

  1. x: 0.6029789447784424
  2. y: 0.15230441093444824
  3. z: -0.1152396947145462
  4. visibility: 0.9999819993972778

调用左眼的x轴

  1. print(results.pose_landmarks.landmark[2].x)
  2. # results.pose_landmarks.landmark[2].x

查看左眼的x轴

0.6029789447784424

还原左眼像素

  1. height, width, channels = img.shape
  2. print("x:{},y:{}".format(results.pose_landmarks.landmark[2].x * width,
  3. results.pose_landmarks.landmark[2].y * height))
  4. #(results.pose_landmarks.landmark[2].x * width,results.pose_landmarks.landmark[2].y * height)

查看左眼像素

x:602.9789447784424,y:116.5128743648529

获取左眼真实坐标 (真实坐标位于人体左右两个髋关节的中点)

  1. print(results.pose_world_landmarks.landmark[2])
  2. # results.pose_world_landmarks.landmark[2]

查看

  1. x: -0.1573336124420166
  2. y: -0.6765229105949402
  3. z: -0.09651455283164978
  4. visibility: 0.9999819993972778

交互式三维可视化

核心思想都是将谷歌的pose_landmarks.landmark坐标提取出来,处理成python的列表数据格式

有两种方式获取三维坐标,可以测得每种方式的FPS

1.map

  1. import time
  2. import cv2
  3. import numpy as np
  4. import mediapipe as mp
  5. mp_pose = mp.solutions.pose
  6. pose = mp_pose.Pose(static_image_mode=False,
  7. model_complexity=0,
  8. smooth_landmarks=True,
  9. enable_segmentation=True,
  10. min_detection_confidence=0.5,
  11. min_tracking_confidence=0.5)
  12. drawing = mp.solutions.drawing_utils
  13. # 提取x,y,z坐标
  14. def get_x(each):
  15. return each.x
  16. def get_y(each):
  17. return each.y
  18. def get_z(each):
  19. return each.z
  20. if __name__ == '__main__':
  21. t0 = time.time()
  22. img = cv2.imread('1.jpg')
  23. img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
  24. results = pose.process(img)
  25. drawing.draw_landmarks(img, results.pose_landmarks, mp_pose.POSE_CONNECTIONS)
  26. coords = np.array(results.pose_landmarks.landmark)
  27. x = np.array(list(map(get_x, coords)))
  28. y = np.array(list(map(get_y, coords)))
  29. z = np.array(list(map(get_z, coords)))
  30. point = np.vstack((x, y, z)).T
  31. print("FPS: {}".format(str(int(1 / (time.time() - t0)))))
  32. print(point)

运行结果

  1. FPS: 4
  2. [[ 0.59134167 0.17020825 -0.10148811]
  3. [ 0.59961498 0.152271 -0.11516247]
  4. [ 0.60297894 0.15230441 -0.11523969]
  5. [ 0.60687125 0.15244034 -0.11535674]
  6. [ 0.59567201 0.15167823 -0.07190258]
  7. [ 0.5958215 0.15112257 -0.07187224]
  8. [ 0.59614486 0.15046394 -0.07175181]
  9. [ 0.62759328 0.16257867 -0.12434941]
  10. [ 0.61252546 0.15917572 0.07216572]
  11. [ 0.5972876 0.186289 -0.10227097]
  12. [ 0.5929876 0.18590152 -0.04587363]
  13. [ 0.67092979 0.25625986 -0.19476134]
  14. [ 0.61552674 0.27312741 0.23764272]
  15. [ 0.76192707 0.32696548 -0.23866405]
  16. [ 0.61490697 0.37373778 0.32929295]
  17. [ 0.70107996 0.4162237 -0.18799469]
  18. [ 0.56296194 0.3469615 0.26747057]
  19. [ 0.68923146 0.43785161 -0.21043821]
  20. [ 0.55015355 0.33435497 0.26719773]
  21. [ 0.67958015 0.42962551 -0.19730502]
  22. [ 0.55084795 0.32428682 0.23829618]
  23. [ 0.68086922 0.42319477 -0.17752741]
  24. [ 0.55502903 0.32787919 0.25126159]
  25. [ 0.6576938 0.51969534 -0.14214163]
  26. [ 0.64055568 0.52023721 0.14222071]
  27. [ 0.52412039 0.62018818 -0.15834816]
  28. [ 0.73181421 0.69024497 0.11025447]
  29. [ 0.56040704 0.81561226 -0.03564663]
  30. [ 0.87673992 0.82232887 0.14302647]
  31. [ 0.58016121 0.83869517 -0.02611996]
  32. [ 0.89598191 0.87518257 0.14569874]
  33. [ 0.50717425 0.87537444 -0.06918345]
  34. [ 0.88722825 0.90083396 0.09929685]]
  35. Process finished with exit code 0

2.for

  1. import time
  2. import cv2
  3. import numpy as np
  4. import mediapipe as mp
  5. mp_pose = mp.solutions.pose
  6. pose = mp_pose.Pose(static_image_mode=False,
  7. model_complexity=0,
  8. smooth_landmarks=True,
  9. enable_segmentation=True,
  10. min_detection_confidence=0.5,
  11. min_tracking_confidence=0.5)
  12. drawing = mp.solutions.drawing_utils
  13. point_x = []
  14. point_y = []
  15. point_z = []
  16. # 提取x,y,z坐标
  17. def get_x_y_z(each):
  18. point_x.append(each.x)
  19. point_y.append(each.y)
  20. point_z.append(each.z)
  21. return point_x, point_y, point_z
  22. if __name__ == '__main__':
  23. t0 = time.time()
  24. img = cv2.imread('1.jpg')
  25. img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
  26. results = pose.process(img)
  27. drawing.draw_landmarks(img, results.pose_landmarks, mp_pose.POSE_CONNECTIONS)
  28. coords = np.array(results.pose_landmarks.landmark)
  29. for index, each in enumerate(coords):
  30. x, y, z = get_x_y_z(each)
  31. point = np.vstack((x, y, z)).T
  32. print("FPS: {}".format(str(int(1 / (time.time() - t0)))))
  33. print(point)

运行结果

  1. FPS: 4
  2. [[ 0.59134167 0.17020825 -0.10148811]
  3. [ 0.59961498 0.152271 -0.11516247]
  4. [ 0.60297894 0.15230441 -0.11523969]
  5. [ 0.60687125 0.15244034 -0.11535674]
  6. [ 0.59567201 0.15167823 -0.07190258]
  7. [ 0.5958215 0.15112257 -0.07187224]
  8. [ 0.59614486 0.15046394 -0.07175181]
  9. [ 0.62759328 0.16257867 -0.12434941]
  10. [ 0.61252546 0.15917572 0.07216572]
  11. [ 0.5972876 0.186289 -0.10227097]
  12. [ 0.5929876 0.18590152 -0.04587363]
  13. [ 0.67092979 0.25625986 -0.19476134]
  14. [ 0.61552674 0.27312741 0.23764272]
  15. [ 0.76192707 0.32696548 -0.23866405]
  16. [ 0.61490697 0.37373778 0.32929295]
  17. [ 0.70107996 0.4162237 -0.18799469]
  18. [ 0.56296194 0.3469615 0.26747057]
  19. [ 0.68923146 0.43785161 -0.21043821]
  20. [ 0.55015355 0.33435497 0.26719773]
  21. [ 0.67958015 0.42962551 -0.19730502]
  22. [ 0.55084795 0.32428682 0.23829618]
  23. [ 0.68086922 0.42319477 -0.17752741]
  24. [ 0.55502903 0.32787919 0.25126159]
  25. [ 0.6576938 0.51969534 -0.14214163]
  26. [ 0.64055568 0.52023721 0.14222071]
  27. [ 0.52412039 0.62018818 -0.15834816]
  28. [ 0.73181421 0.69024497 0.11025447]
  29. [ 0.56040704 0.81561226 -0.03564663]
  30. [ 0.87673992 0.82232887 0.14302647]
  31. [ 0.58016121 0.83869517 -0.02611996]
  32. [ 0.89598191 0.87518257 0.14569874]
  33. [ 0.50717425 0.87537444 -0.06918345]
  34. [ 0.88722825 0.90083396 0.09929685]]
  35. Process finished with exit code 0

FPS都差不多

之后显示成三维

  1. import cv2
  2. import time
  3. import numpy as np
  4. import open3d
  5. import mediapipe as mp
  6. mp_pose = mp.solutions.pose
  7. pose = mp_pose.Pose(static_image_mode=False,
  8. model_complexity=0,
  9. smooth_landmarks=True,
  10. enable_segmentation=True,
  11. min_detection_confidence=0.5,
  12. min_tracking_confidence=0.5)
  13. drawing = mp.solutions.drawing_utils
  14. point_x = []
  15. point_y = []
  16. point_z = []
  17. # 提取x,y,z坐标
  18. def get_x_y_z(each):
  19. point_x.append(each.x)
  20. point_y.append(each.y)
  21. point_z.append(each.z)
  22. return point_x, point_y, point_z
  23. if __name__ == '__main__':
  24. img = cv2.imread('1.jpg')
  25. img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
  26. results = pose.process(img)
  27. drawing.draw_landmarks(img, results.pose_landmarks, mp_pose.POSE_CONNECTIONS)
  28. coords = np.array(results.pose_landmarks.landmark)
  29. for index, each in enumerate(coords):
  30. x, y, z = get_x_y_z(each)
  31. point = np.vstack((x, y, z)).T
  32. # 3d点云 3维可视化
  33. point_cloud = open3d.geometry.PointCloud()
  34. points = open3d.utility.Vector3dVector(point)
  35. point_cloud.points = points
  36. open3d.visualization.draw_geometries([point_cloud])

运行结果

关键点3D交互式可视化


优化显示效果

预测的关键点更加显著

将各个关键点加粗成莫兰迪色

莫兰蒂色卡

因为现在的img通道是bgr,所以调色板的rgb顺序得对换以下

代码优化

  1. import cv2
  2. import numpy as np
  3. import mediapipe as mp
  4. mp_pose = mp.solutions.pose
  5. pose = mp_pose.Pose(static_image_mode=True,
  6. model_complexity=2,
  7. smooth_landmarks=True,
  8. min_detection_confidence=0.5,
  9. min_tracking_confidence=0.5)
  10. drawing = mp.solutions.drawing_utils
  11. img = cv2.imread("1.jpg")
  12. height, width, channels = img.shape
  13. point_x = []
  14. point_y = []
  15. # 提取x,y坐标
  16. def get_x_y(each):
  17. point_x.append(int(each.x * width))
  18. point_y.append(int(each.y * height))
  19. return point_x, point_y
  20. if __name__ == '__main__':
  21. print("height:{}, width:{}".format(height, width))
  22. results = pose.process(img)
  23. drawing.draw_landmarks(img, results.pose_landmarks, mp_pose.POSE_CONNECTIONS)
  24. coords = np.array(results.pose_landmarks.landmark)
  25. for index, each in enumerate(coords):
  26. x, y = get_x_y(each)
  27. points = np.vstack((x, y)).T
  28. radius = 5
  29. for index, point in enumerate(points):
  30. # nose
  31. if index == 0:
  32. img = cv2.circle(img, (point[0], point[1]), radius, (133, 152, 164), -1)
  33. # shoulder
  34. elif index in [11, 12]:
  35. img = cv2.circle(img, (point[0], point[1]), radius, (117, 149, 188), -1)
  36. # hip joint
  37. elif index in [23, 24]:
  38. img = cv2.circle(img, (point[0], point[1]), radius, (177, 202, 215), -1)
  39. # elbow
  40. elif index in [13, 14]:
  41. img = cv2.circle(img, (point[0], point[1]), radius, (221, 227, 229), -1)
  42. # lap
  43. elif index in [25, 26]:
  44. img = cv2.circle(img, (point[0], point[1]), radius, (117, 175, 198), -1)
  45. # wrist and ankle
  46. elif index in [15, 16, 27, 28]:
  47. img = cv2.circle(img, (point[0], point[1]), radius, (146, 134, 118), -1)
  48. # left hand
  49. elif index in [17, 19, 21]:
  50. img = cv2.circle(img, (point[0], point[1]), radius, (122, 137, 128), -1)
  51. # right hand
  52. elif index in [18, 20, 22]:
  53. img = cv2.circle(img, (point[0], point[1]), radius, (115, 117, 117), -1)
  54. # left feet
  55. elif index in [27, 29, 31]:
  56. img = cv2.circle(img, (point[0], point[1]), radius, (205, 209, 212), -1)
  57. # right feet
  58. elif index in [28, 30, 32]:
  59. img = cv2.circle(img, (point[0], point[1]), radius, (132, 115, 132), -1)
  60. # mouth
  61. elif index in [9, 10]:
  62. img = cv2.circle(img, (point[0], point[1]), radius, (79, 84, 113), -1)
  63. # face and eye
  64. elif index in [1, 2, 3, 4, 5, 6, 7, 8]:
  65. img = cv2.circle(img, (point[0], point[1]), radius, (212, 195, 202), -1)
  66. # other
  67. else:
  68. img = cv2.circle(img, (point[0], point[1]), radius, (140, 47, 240), -1)
  69. cv2.imshow("aug_keypoint", img)
  70. cv2.waitKey(0)
  71. cv2.destroyAllWindows()

运行结果


实时视频人体姿态估计

1.摄像头拍摄视频实时人体姿态估计

算法核心:摄像头打开后估计人体姿态,10秒钟退出。

若觉得10秒太短,可修改:

if ((time.time() - t0) // 1) == 10:

完整代码

  1. import sys
  2. import cv2
  3. import time
  4. import mediapipe as mp
  5. mp_pose = mp.solutions.pose
  6. drawing = mp.solutions.drawing_utils
  7. pose = mp_pose.Pose(static_image_mode=False,
  8. model_complexity=1,
  9. smooth_landmarks=True,
  10. enable_segmentation=True,
  11. min_detection_confidence=0.5,
  12. min_tracking_confidence=0.5)
  13. def process_frame(img):
  14. results = pose.process(img)
  15. drawing.draw_landmarks(img, results.pose_landmarks, mp_pose.POSE_CONNECTIONS)
  16. return img
  17. if __name__ == '__main__':
  18. t0 = time.time()
  19. cap = cv2.VideoCapture(0)
  20. cap.open(0)
  21. while cap.isOpened():
  22. success, frame = cap.read()
  23. if not success:
  24. raise ValueError("Error")
  25. frame = process_frame(frame)
  26. cv2.imshow("keypoint", frame)
  27. if ((time.time() - t0) // 1) == 10:
  28. sys.exit(0)
  29. cv2.waitKey(1)
  30. cap.release()
  31. cv2.destroyAllWindows()

此代码能正常运行,不展示运行结果!


2.视频实时人体姿态估计

算法核心:打开保存好的视频后估计人体姿态,视频读取完后退出。

完整代码

  1. import os
  2. import sys
  3. import cv2
  4. import mediapipe as mp
  5. BASE_DIR = os.path.dirname((os.path.abspath(__file__)))
  6. print(BASE_DIR)
  7. sys.path.append(BASE_DIR)
  8. mp_pose = mp.solutions.pose
  9. drawing = mp.solutions.drawing_utils
  10. pose = mp_pose.Pose(static_image_mode=False,
  11. model_complexity=1,
  12. smooth_landmarks=True,
  13. enable_segmentation=True,
  14. min_detection_confidence=0.5,
  15. min_tracking_confidence=0.5)
  16. def process_frame(img):
  17. results = pose.process(img)
  18. drawing.draw_landmarks(img, results.pose_landmarks, mp_pose.POSE_CONNECTIONS)
  19. return img
  20. if __name__ == '__main__':
  21. video_dirs = os.path.join(BASE_DIR, "1.mp4")
  22. cap = cv2.VideoCapture(video_dirs)
  23. while cap.isOpened():
  24. success, frame = cap.read()
  25. if frame is None:
  26. break
  27. if success == True:
  28. frame = process_frame(frame)
  29. cv2.imshow("keypoint", frame)
  30. cv2.waitKey(1)
  31. cap.release()
  32. cv2.destroyAllWindows()

运行结果

原视频是最近报爆火的刘耕宏健身操。

人体姿态检测

在这个视频中证明此算法还是存在缺陷,因为视频中无法很好的将两人同时识别。


3.视频实时人体姿态估计代码优化

运用了上面所提及的莫兰迪色系,作为关键点的颜色。

完整代码

  1. import cv2
  2. import time
  3. import numpy as np
  4. from tqdm import tqdm
  5. import mediapipe as mp
  6. mp_pose = mp.solutions.pose
  7. pose = mp_pose.Pose(static_image_mode=True,
  8. model_complexity=2,
  9. smooth_landmarks=True,
  10. min_detection_confidence=0.5,
  11. min_tracking_confidence=0.5)
  12. drawing = mp.solutions.drawing_utils
  13. def process_frame(img):
  14. height, width, channels = img.shape
  15. start = time.time()
  16. results = pose.process(img)
  17. if results.pose_landmarks:
  18. drawing.draw_landmarks(img, results.pose_landmarks, mp_pose.POSE_CONNECTIONS)
  19. coords = np.array(results.pose_landmarks.landmark)
  20. for index, each in enumerate(coords):
  21. cx = int(each.x * width)
  22. cy = int(each.y * height)
  23. cz = each.z
  24. radius = 5
  25. # nose
  26. if index == 0:
  27. img = cv2.circle(img, (cx, cy), radius, (133, 152, 164), -1)
  28. # shoulder
  29. elif index in [11, 12]:
  30. img = cv2.circle(img, (cx, cy), radius, (117, 149, 188), -1)
  31. # hip joint
  32. elif index in [23, 24]:
  33. img = cv2.circle(img, (cx, cy), radius, (177, 202, 215), -1)
  34. # elbow
  35. elif index in [13, 14]:
  36. img = cv2.circle(img, (cx, cy), radius, (221, 227, 229), -1)
  37. # lap
  38. elif index in [25, 26]:
  39. img = cv2.circle(img, (cx, cy), radius, (117, 175, 198), -1)
  40. # wrist and ankle
  41. elif index in [15, 16, 27, 28]:
  42. img = cv2.circle(img, (cx, cy), radius, (146, 134, 118), -1)
  43. # left hand
  44. elif index in [17, 19, 21]:
  45. img = cv2.circle(img, (cx, cy), radius, (122, 137, 128), -1)
  46. # right hand
  47. elif index in [18, 20, 22]:
  48. img = cv2.circle(img, (cx, cy), radius, (115, 117, 117), -1)
  49. # left feet
  50. elif index in [27, 29, 31]:
  51. img = cv2.circle(img, (cx, cy), radius, (205, 209, 212), -1)
  52. # right feet
  53. elif index in [28, 30, 32]:
  54. img = cv2.circle(img, (cx, cy), radius, (132, 115, 132), -1)
  55. # mouth
  56. elif index in [9, 10]:
  57. img = cv2.circle(img, (cx, cy), radius, (79, 84, 113), -1)
  58. # face and eye
  59. elif index in [1, 2, 3, 4, 5, 6, 7, 8]:
  60. img = cv2.circle(img, (cx, cy), radius, (212, 195, 202), -1)
  61. # other
  62. else:
  63. img = cv2.circle(img, (cx, cy), radius, (140, 47, 240), -1)
  64. else:
  65. fail = "fail detection"
  66. img = cv2.putText(img, fail, (25, 100), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 3)
  67. FPS = 1 / (time.time() - start)
  68. img = cv2.putText(img, "FPS" + str(int(FPS)), (25, 50), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 3)
  69. return img
  70. def out_video(input):
  71. file = input.split("/")[-1]
  72. output = "out-" + file
  73. print("It will start processing video: {}".format(input))
  74. cap = cv2.VideoCapture(input)
  75. frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
  76. frame_size = (cap.get(cv2.CAP_PROP_FRAME_WIDTH), cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
  77. # # create VideoWriter,VideoWriter_fourcc is video decode
  78. fourcc = cv2.VideoWriter_fourcc(*'mp4v')
  79. fps = cap.get(cv2.CAP_PROP_FPS)
  80. out = cv2.VideoWriter(output, fourcc, fps, (int(frame_size[0]), int(frame_size[1])))
  81. # the progress bar
  82. with tqdm(range(frame_count)) as pbar:
  83. while cap.isOpened():
  84. success, frame = cap.read()
  85. if not success:
  86. break
  87. try:
  88. frame = process_frame(frame)
  89. out.write(frame)
  90. pbar.update(1)
  91. except:
  92. print("ERROR")
  93. pass
  94. pbar.close()
  95. cv2.destroyAllWindows()
  96. out.release()
  97. cap.release()
  98. print("{} finished!".format(output))
  99. if __name__ == '__main__':
  100. video_dirs = "1.mp4"
  101. out_video(video_dirs)

运行结果

pbar程序运行

人体关键点检测优化

很明显比之前的效果更好!

第一次做视频效果不太熟练还请见谅!


下一话

3D视觉——2.人体姿态估计(Pose Estimation)入门——OpenPose含安装、编译、使用(单帧、实时视频)https://blog.csdn.net/XiaoyYidiaodiao/article/details/125565738?spm=1001.2014.3001.5502

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/2023面试高手/article/detail/96420
推荐阅读
相关标签
  

闽ICP备14008679号