跌倒检测
Fall detection has become an important stepping stone in the research of action recognition — which is to train an AI to classify general actions such as walking and sitting down. What humans interpret as an obvious action of a person falling face flat is but a sequence of jumbled up pixels for an AI. To enable the AI to make sense of the input it receives, we need to teach it to detect certain patterns and shapes, and formulate its own rules.
女不限检测已经成为动作识别的研究提供了重要的垫脚石-这是AI的训练进行分类的一般行为,如步行和坐下。 人类将其解释为人脸平躺时的明显动作,不过是AI的一系列混乱像素。 为了使AI能够理解接收到的输入,我们需要教它检测特定的图案和形状,并制定自己的规则。
To build an AI to detect falls, I decided not to go through the torture of amassing a large dataset and training a model specifically for this purpose. Instead, I used pose estimation as the building block.
为了构建能够检测跌倒的AI,我决定不经历收集大型数据集和为此目的专门训练模型的折磨。 相反,我使用姿势估计作为构建基块。
姿势估计 (Pose Estimation)
Pose estimation is the localisation of human joints — commonly known as keypoints — in images and video frames. Typically, each person will be made up of a number of keypoints. Lines will be drawn between keypoint pairs, effectively mapping a rough shape of the person. There is a variety of pose estimation methods based on input and detection approach. For a more in-depth guide to pose estimation, do check out this article by Sudharshan Chandra Babu.
姿势估计是人体关节(通常称为关键点)在图像和视频帧中的定位。 通常,每个人都将由多个关键点组成。 将在关键点对之间绘制线条,以有效地绘制人的大致形状。 基于输入和检测方法的姿势估计方法有很多种。 有关姿势估计的更深入指南,请查看Sudharshan Chandra Babu的本文。
To make this model easily accessible to everyone, I chose the input as RGB images and processed by OpenCV. This means it is compatible with typical webcams, video files, and even HTTP/RTSP streams.
为了使该模型易于所有人使用,我选择了输入作为RGB图像并由OpenCV处理。 这意味着它与典型的网络摄像头,视频文件甚至HTTP / RTSP流兼容。
预训练模型 (Pretrained Model)
The pose estimation model that I utilised was OpenPifPaf by VITA lab at EPFL. The detection approach is bottom-up, which means that the AI first analyses the entire image and figures out all the keypoints it sees. Then, it groups keypoints together to determine the people in the image. This differs from a top-down approach, where the AI uses a basic person detector to identify regions of interest, before zooming in to identify individual keypoints. To learn more about how OpenPifPaf was developed, do check out their CVPR 2019 paper, or read their source code.
我使用的姿势估计模型是EPFL的VITA实验室的OpenPifPaf 。 该检测方法是自下而上的,这意味着AI首先分析整个图像并找出它看到的所有关键点。 然后,它将关键点分组在一起以确定图像中的人物。 这与自顶向下方法不同,在自顶向下方法中,AI使用基本人员检测器来识别感兴趣的区域,然后再放大以识别各个关键点。 要了解有关OpenPifPaf如何开发的更多信息,请查看其CVPR 2019论文或阅读其源代码。
多流输入 (Multi-Stream Input)
Most open-source models can only process a single input at any one time. To make this more versatile and scalable in the future, I made use of the multiprocessing library in Python to process multiple streams concurrently using subprocesses. This allows us to fully leverage multiple processors on machines with this capability.
大多数开源模型只能在任何时候处理单个输入。 为了将来使它更具通用性和可扩展性,我使用了Python中的多处理库来使用子进程同时处理多个流。 这使我们能够在具有此功能的计算机上充分利用多个处理器。
人员追踪 (Person Tracking)
In video frames with multiple people, it can be difficult to figure out a person who falls. This is because the algorithm needs to correlate the same person between consecutive frames. But how does it know whether it is looking at the same person if he/she is moving constantly?
在有多个人的视频帧中,很难找出一个摔倒的人。 这是因为算法需要在连续的帧之间关联同一个人。 但是,如果他/她不断移动,它如何知道是否在看同一个人呢?
The solution is to implement a multiple person tracker. It doesn’t have to be fancy; just a simple general object tracker will suffice. How tracking is done is pretty straightforward and can be outlined in the following steps:
解决方案是实施多人跟踪器。 不必花哨; 只需一个简单的通用对象跟踪器就足够了。 如何完成跟踪非常简单,可以在以下步骤中概述:
- Compute centroids (taken as the neck points) 计算质心(以颈部为准)
- Assign unique ID to each centroid 为每个质心分配唯一的ID
- Compute new centroids in the next frame 在下一帧中计算新质心
- Calculate the Euclidean distance between centroids of the current and previous frame, and correlate them based on the minimum distance计算当前帧与上一帧的质心之间的欧几里得距离,并根据最小距离对其进行关联
- If the correlation is found, update the new centroid with the ID of the old centroid如果找到相关性,则用旧质心的ID更新新质心
- If the correlation is not found, give the new centroid a unique ID (new person enters the frame)如果找不到相关性,则给新质心一个唯一的ID(新人进入框架)
- If the person goes out of the frame for a set amount of frames, remove the centroid and the ID 如果此人离开框架达一定数量的框架,请移除质心和ID
If you want a step-by-step tutorial on object tracking with actual code, check out this post by Adrian Rosebrock.
如果您想了解有关使用实际代码进行对象跟踪的分步教程,请参阅Adrian Rosebrock的这篇文章。
跌倒检测算法 (Fall Detection Algorithm)
The initial fall detection algorithm that was conceptualised was relatively simplistic. I first chose the neck as the stable reference point (compare that with swinging arms and legs). Next, I calculated the perceived height of the person based on bounding boxes that defined the entire person. Then, I computed the vertical distance between neck points at intervals of frames. If the vertical distance exceeded half the perceived height of the person, the algorithm would signal a fall.
概念化的初始跌倒检测算法相对简单。 我首先选择脖子作为稳定的参考点(与摆动的胳膊和腿比较)。 接下来,我根据定义整个人的边界框计算了人的感知高度。 然后,我以帧间隔计算了脖子点之间的垂直距离。 如果垂直距离超过人的感知身高的一半,该算法将发出跌倒信号。
However, after coming across multiple YouTube videos of people falling, I realised there were different ways and orientations of falling. Some falls were not detected when the field of view was at an angle, as the victims did not appear to have a drastic change in motion. My model was also not robust enough and kept throwing false positives when people bent down to tie their shoelaces, or ran straight down the video frame.
但是,在观看多部关于摔倒的YouTube视频后,我意识到摔倒的方式和方向不同。 当视场倾斜时,未检测到一些跌倒,因为受害者似乎并没有剧烈运动变化。 我的模型也不够坚固,当人们弯腰绑鞋带或直接沿着视频帧奔跑时,我的模型总是会产生误报。
I decided to implement more features to refine my algorithm:
我决定实施更多功能来完善我的算法:
- Instead of analysing one-dimensional motion (y-axis), I analysed two-dimensional motion (both x and y-axis) to encompass different camera angles. 我没有分析一维运动(y轴),而是分析了二维运动(x和y轴)以包含不同的相机角度。
- Added a bounding box check to see if the width of the person was larger than his height. This assumes that the person is on the ground and not upright. I was able to eliminate false positives by fast-moving people or cyclists using this method. 添加了边框检查,以查看人的宽度是否大于其身高。 这假定该人在地面上而不是直立的。 通过使用这种方法,快速移动的人或骑自行车的人能够消除误报。
- Added a two-point check to only watch out for falls if both the person’s neck and ankle points can be detected. This prevents inaccurate computation of the person’s height if the person cannot be fully identified due to occlusions. 添加了两点检查功能,仅当可以同时检测到该人的脖子和脚踝点时才注意跌倒。 如果由于遮挡而无法完全识别人的身高,这可以防止对人的身高进行不正确的计算。
检测结果 (Test Results)
As of this writing, extensive fall detection datasets are scarce. I chose the UR Fall Detection Dataset to test my model as it contained different fall scenarios. Out of a total of 30 videos, the model correctly identified 25 falls and missed the other 5 as the subject fell out of the frame. This gave me a precision of 83.33% and an F1 score of 90.91%.
在撰写本文时,缺乏大量的跌倒检测数据集。 我选择UR跌倒检测数据集来测试我的模型,因为它包含不同的跌倒场景。 在总共30个视频中,该模型正确地识别了25个跌倒,并在主体掉出画面时错过了另外5个跌倒。 这给了我83.33%的精确度和90.91%的F1分数。
These results can be considered a good start but are far from conclusive due to the small sample size. The lack of other fall-like actions such as tying shoelaces also meant that I could not stress test my model for false positives.
这些结果可以被认为是一个良好的开端,但由于样本量较小,因此尚无定论。 由于没有其他类似跌倒的动作,例如系鞋带,这也意味着我无法对模型进行假阳性测试。
The test was executed on two NVIDIA Quadro GV100s and achieved an average of 6 FPS, which is barely sufficient for real-time processing. The computation as a result of the numerous layers is extremely intensive. Models that claim to run at speeds above 15 FPS are typically inaccurate, or are backed by monstrous GPUs.
该测试是在两台NVIDIA Quadro GV100上执行的,平均速度为6 FPS,仅够进行实时处理。 由于众多层的计算非常密集。 声称以高于15 FPS的速度运行的模型通常是不准确的,或者由可怕的GPU支持。
应用领域 (Applications)
Fall detection can be applied in many scenarios to provide assistance. A non-exhaustive list includes:
跌倒检测可用于许多情况下以提供帮助。 非详尽清单包括:
- Drunk people 喝醉的人
- The elderly老人
- Kids in the playground孩子们在操场上
- People who suffer from medical conditions like heart attacks or strokes患有心脏病或中风等疾病的人
- Careless people who trip and fall粗心的人绊倒
For the more serious cases, swift response to a fall can mean the difference between life and death.
对于更严重的情况,对跌倒的SwiftReact可能意味着生与死之间的差异。
未来发展 (Future Development)
The accuracy of fall detection is heavily dependent on the pose estimation accuracy. Typical pose estimation models are trained on clean images with a full-frontal view of the subject. However, falls cause the subject to be contorted in weird poses, and most pose estimation models are not able to accurately define the skeleton in such scenarios. Furthermore, the models are not robust enough to overcome occlusions or image noise.
跌倒检测的精度在很大程度上取决于姿势估计精度。 典型的姿势估计模型是在具有主体正面的清晰图像上训练的。 但是,跌倒会导致对象扭曲为怪异的姿势,并且大多数姿势估计模型都无法在这种情况下准确定义骨骼。 此外,这些模型的鲁棒性不足以克服遮挡或图像噪声。
To attain a human-level detection accuracy, current pose estimation models will need to be retrained on a larger variety of poses, and include lower-resolution images with occlusions.
为了达到人类水平的检测精度,当前的姿势估计模型将需要在更多种姿势上进行训练,并包括具有遮挡的低分辨率图像。
Current hardware limitations also impede the ability of pose estimation models to run smoothly on videos with high frame rates. It will be some time before these models will be able to run easily on any laptop with a basic GPU, or even only with a CPU.
当前的硬件限制也阻碍了姿势估计模型在具有高帧频的视频上平稳运行的能力。 这些模型将需要一段时间才能在具有基本GPU甚至仅具有CPU的任何笔记本电脑上轻松运行。
Apart from pose estimation, a deep learning model trained specifically on falls would likely perform as well or even better. The model must be trained carefully to distinguish falls from other fall-like actions. This, of course, must be coupled with extensive, publicly available fall datasets to train the model on. Of course, such a model is limited in scope as it only can identify one particular action, and not a variety of actions.
除了姿势估计之外,专门针对跌倒进行训练的深度学习模型可能会表现得甚至更好。 必须仔细训练模型以区分跌倒和其他类似跌倒的动作。 当然,这必须与广泛的,公开可用的秋季数据集相结合,以训练模型。 当然,这种模型的范围是有限的,因为它只能标识一个特定的动作,而不能标识各种动作。
Another possible approach would also be knowledge-based systems, which is developing a model such that it is able to learn the way humans do. This can be achieved via a rule-based system where it makes decisions based on certain rules, or a case-based system where it applies similarities in past cases it has seen to make an informed judgement about a new case.
另一种可能的方法也将是基于知识的系统,该系统正在开发一种模型,从而能够学习人类的行为。 这可以通过基于规则的系统在其中基于某些规则进行决策来实现,也可以通过基于案例的系统在过去的案例中应用相似性来实现,而过去的案例中,它已经看到了对新案例的明智判断。
结论 (Conclusion)
To solve the more difficult problem of general action recognition — which comprises a plethora of actions — we must first understand and master the intricacies of detecting a single action. If we are able to develop a model that can easily identify a fall just like you or I would, we will be able to extract certain patterns that can allow the model to just as easily detect other types of actions.
为了解决包括多个动作在内的一般动作识别这一较困难的问题,我们必须首先了解并掌握检测单个动作的复杂性。 如果我们能够开发出一个可以像您或我一样轻松识别跌倒的模型,我们将能够提取某些模式,从而使该模型能够轻松地检测其他类型的动作。
The path to action recognition is still undoubtedly a challenging one; but just like other cutting-edge models such as OpenAI’s GPT-3, we will be able to discover new techniques previously unheard of.
行动识别的路径无疑仍然是一个挑战。 但是,就像OpenAI的GPT-3等其他尖端模型一样,我们将能够发现以前闻所未闻的新技术。
If you would like to share any ideas or opinions, do leave a comment below, or drop me a connect on LinkedIn.
如果您想分享任何想法或意见,请在下面发表评论,或在LinkedIn上让我保持联系。
If you would like to see how I developed the full model, do check out my GitHub repository for the source codes.
如果您想了解我如何开发完整模型,请查看GitHub存储库中的源代码。
翻译自: https://towardsdatascience.com/fall-detection-using-pose-estimation-a8f7fd77081d
跌倒检测