赞
踩
个人收藏使用
目录
1. ATE RMSE,RPE RMSE等指标能间接说明不同slam算法建图效果好坏吗?
[1] A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM
[2] ElasticFusion: Dense SLAM without a pose graph
[4] Robust reconstruction of indoor scenes
[5] Maskfusion: Real-time recognition, tracking and reconstruction of multiple moving objects
[6] Real-time non-rigid reconstruction using an RGB-D camera
[7] Real-time large-scale dense RGB-D SLAM with volumetric fusion
[8] Co-fusion: Real-time segmentation, tracking and fusion of multiple objects
[9] Refusion: 3d reconstruction in dynamic environments for rgb-d cameras exploiting residuals
[10] Kimera: an open-source library for real-time metric-semantic localization and mapping
[11] Densifying sparse vio: a mesh-based approach using structural regularities
[12] Fusion4d: Real-time performance capture of challenging scenes
问题:
请问怎么评价slam建图效果,ATE RMSE,RPE RMSE等指标能间接说明不同slam算法建图效果好坏吗,谢谢!
感谢提问!
我对这个问题也非常感兴趣,所以花了一点时间做了些调查。由于知识星球对富格式文本支持不太好,支持添加到图片个数也十分有限,我把完整的回答内容放在了附图中。
----
这里的“slam建图”我理解为类似RGB-D SLAM所进行的稠密建图哈。
是的。
文献[4]中在评估轨迹精度时有直接的表达:
而CoFusion[8]中有这样一段比较含蓄的话:
而文献:[Stückler, Jörg, and Sven Behnke. "Multi-resolution surfel maps for efficient dense 3D modeling and tracking." Journal of Visual Communication and Image Representation 25.1 (2014): 137-147.]的6.2节说得是场景重建的评估,但是还是使用了轨迹精度进行定量实验。
但是也有些文献持比较坚决的意见,认为轨迹误差并不能够直接用来表达建图精度。
文献[7]认为,轨迹估计精度高并不意味着建图的精度就高:
"We present a number of quantitative and qualitative results on evaluating the surface reconstructions produced by our system. In our experience a high score on a camera trajectory benchmark does not always imply a high-quality surface reconstruction due to the frame-to-model tracking component of the system. In previous work we found that although other methods for camera pose estimation may score better on benchmarks, the resulting reconstructions are not as accurate if frame-to-model tracking is not being utilized (Whelan et al., 2013a)(我们提出了一些定量和定性的结果来评估由我们的系统产生的表面重建。根据我们的经验,由于系统的帧到模型跟踪组件,摄像机轨迹基准上的高分并不总是意味着高质量的表面重建。在之前的工作中,我们发现,尽管其他的摄像机姿态估计方法可能在基准上取得更好的分数,但如果没有利用帧到模型的跟踪,得到的重建结果就不那么准确(Whelan et al., 2013a)。)."
ICL-NUIM 数据集[1]论文结尾部分也提到:
"Further, we have evaluated a number of existing visual odometry methods within the Kintinuous pipeline and shown through experimentation that a good trajectory estimate, which previous to this paper was the only viable benchmark measure, is not indicative of a good surface reconstruction.(此外,我们评估了kincontinuous管道内现有的许多视觉测程方法,并通过实验表明,良好的轨迹估计(在本文之前是唯一可行的基准测量)并不意味着良好的表面重建)"
认为轨迹精度好并不表示表面重构就准确。
基本上都需要将稠密建图得到的模型和模型真值进行对齐,对于每个建图模型中的点或小三角平面(模型是使用多个小三角形组成Mesh的情况),计算到它最近的模型真值顶点的距离,或者是到最近模型真值三角平面的距离。得到这些距离之后,会使用
也有一些其他的方式,比如:
至于定性评估,几乎每个做稠密建图的论文都有,其目的也是为了突出自身在某些特殊场景下的优越性,所以就不单列了。
[1] Handa, Ankur, et al. "A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM." 2014 IEEE international conference on Robotics and automation (ICRA). IEEE, 2014.
评估方式:1 //johntodo
ICL-NUIM数据集的论文。文章中VI.B Error metrics的相关内容:
"We quantify the accuracy of surface reconstruction by using the “cloud/mesh” distance metric provided by CloudCompare. The process involves firstly coarsely aligning the reconstruction with the source model by manually selecting point correspondences. From here, the mesh model is densely sampled to create a point cloud model which the reconstruction is finely aligned to using ICP. Finally, for each vertex in the reconstruction, the closest triangle in the model is located and the perpendicular distance between the vertex and closest triangle is recorded. Five standard statistics are computed over the distances for all vertices in the reconstruction: Mean, Median, Std., Min and Max. We provide a tutorial on executing this process at http://www.youtube.com/watch?v=X9gDAElt8HQ." (我们通过使用CloudCompare提供的“cloud/mesh”距离度量来量化表面重建的精度。该方法首先通过手工选择点对应关系,对重构结果与源模型进行粗对齐。在此基础上,对网格模型进行密集采样,创建点云模型,利用ICP对其进行精细对齐重建。最后,对重建中的每个顶点,定位模型中最近的三角形,并记录该顶点与最近三角形的垂直距离。在重建过程中,计算所有顶点的距离的五个标准统计数据:均值、中值、Std、最小值和最大值。我们提供关于执行此过程的教程)
(1.先选中两个点云,2.点击Tools->Distance->cloud/cloud,3.选择颜色进行计算)
所以和自动计算位姿轨迹精度的工具不同,ICL-NUIM是建议手工提供点云模型对齐初始值,然后自动ICP对准,最后通过计算点云-三角片的距离,通过统计值表征建图准确程度。视频【www.youtube.com/watch?v=X9gDAElt8HQ】中也提供了完整实现这个过程的的演示,后面很多文章的定量度量都是参考ICL-NUIM数据集的评估方式设计的。
[2] Whelan, Thomas, et al. "ElasticFusion: Dense SLAM without a pose graph." Robotics: Science and Systems, 2015.
评估方式:1(每个重建点到最近的GT表面的平均距离)
评价结果表格/附图等,见本回答图片,下同。
[3] Dai, Angela, et al. "Bundlefusion: Real-time globally consistent 3d reconstruction using on-the-fly surface reintegration." ACM Transactions on Graphics (ToG) 36.4 (2017): 1.
评估方式:1
[4] Choi, Sungjoon, Qian-Yi Zhou, and Vladlen Koltun. "Robust reconstruction of indoor scenes." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
评估方式:1
“To evaluate surface reconstruction accuracy on ICL-NUIM scenes we use the error measures proposed by Handa et al., specifically the mean and median of the distances of the reconstructed surfaces to the ground-truth surfaces.
..
Note that this is a direct evaluation of the metric accuracy of reconstructed models.”
[5] Runz, Martin, Maud Buffier, and Lourdes Agapito. "Maskfusion: Real-time recognition, tracking and reconstruction of multiple moving objects." 2018 IEEE International Symposium on Mixed and Augmented Reality (ISMAR). IEEE, 2018.
评估方式:2(热图展示)
“The average 3D error for the bleach bottle was 7.0mm with a standard deviation of 5.8mm (where the GT bottle is 250mm tall and 100mm across).”
和ElasticFusion、BundleFusion不同,MaskFusion没有使用单一的一个指标去度量建图准确程度,而是使用热图的方式将定量的评估结果进行定性地展示。
[6] Zollhöfer, Michael, et al. "Real-time non-rigid reconstruction using an RGB-D camera." ACM Transactions on Graphics (ToG) 33.4 (2014): 1-12.
虽然并不是针对SLAM问题设计的重建系统,但是之前看到一些其他的RGB-D SLAM系统将它作为对比,所以还是放在这里了。
“The figure compares renderings of the original mesh and our reconstruction, as well as plots of the deviation for three frames of the animation, where red corresponds to a fitting error of 3mm. ”
这里只是说了是和真值进行对比,但是并没有仔细介绍这里的fitting error 是怎么来的。另外这里的图10放的是人脸的热图,怎么说呢,我是看了之后san值狂掉……为了保护各位眼睛这里就不放了,感兴趣的小伙伴可以去论文里面找一找。
[7] Whelan, Thomas, et al. "Real-time large-scale dense RGB-D SLAM with volumetric fusion." The International Journal of Robotics Research 34.4-5 (2015): 598-626.
评估方式:2+3+5 //johntodo
“Given that both maps lie in the global coordinate frame we can iteratively minimize nearest-neighbor point-wise correspondences between the two maps using standard point-to-plane ICP. ... We measure the remaining root–mean–square residual error between point correspondences as the residual similarity error between the two maps. ”(考虑到两个map都位于全局坐标系中,我们可以使用标准点到平面ICP,迭代最小化两个map之间的最近邻点对。我们将点对应之间剩余的均方根残差测量为两个map之间的残差相似度误差。)
不过上面的这个热图是两种不同方式构建地图之间的对比;但也能够差不多说明“评价建图精度”的方式。
"However, each RGB-D frame does have ground truth depth information which we compare against. For each frame in a dataset we compute a histogram of the perdepth-pixel L1-norm error between the ground truth depth map and the predicted surface depth map raycast from the TSDF, normalizing by the number of valid pixels before aligning all histograms into a two-dimensional area plot." (然而,每个RGB-D帧都有我们比较的真实深度信息。对于数据集中的每一帧,我们计算真实深度图和来自TSDF的预测深度图之间的纵深像素L1范数误差的直方图,在将所有直方图对齐到二维面积图之前,通过有效像素的数量归一化)。
[8] Rünz, Martin, and Lourdes Agapito. "Co-fusion: Real-time segmentation, tracking and fusion of multiple objects." 2017 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2017.
评估方式:2+3 (热图定性展示+单独物体对象与真值的误差的定量比较)
"This error is strongly conditioned on the tracking, but nicely highlights the quality of the overall system. For each surfel in the unified map of active models, we compute the distance to the closest point on the ground-truth meshes, after aligning the two representations. Figure 8 visualizes the reconstruction error as a heat-map and highlights differences to Elastic-Fusion. For the real scene Esone1 we computed the 3D reconstruction errors of each object independently. The results are shown in Table I and Figure 10."
也是使用热图,将定量的评估结果进行定性展示。
这表格中还额外针对误差超过1cm和5cm的点的比例进行了额外的计算和对比,这也是可以从一个侧面反映出建图精度。 //hereto2021-05-27
[9] Palazzolo, Emanuele, et al. "Refusion: 3d reconstruction in dynamic environments for rgb-d cameras exploiting residuals." arXiv preprint arXiv:1905.02082 (2019).
评估方式:2+3
"We compare the models built by our algorithm and by StaticFusion [14] for the sequences crowd3 and removing nonobstructing box w.r.t. the ground truth. For each point of the evaluated model, we measure its distance from the ground truth."
"For a quantitative evaluation, Fig. 11 shows the cumulative percentage of points at a certain distance from the ground truth for the models of the two considered sequences. The plots show in both cases that the reconstructed model by our approach is more accurate."(为了进行定量评价,图11显示了两个考虑序列的模型在距离真值一定距离处的点的累积百分比。在这两种情况下,用我们的方法重建的模型更准确)
[10] Rosinol, Antoni, et al. "Kimera: an open-source library for real-time metric-semantic localization and mapping." 2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2020.
评估方式:2+4
"...and (iii) we evaluate the average distance from ground truth point cloud to its nearest neighbor in the estimated point cloud (accuracy), and vice-versa (completeness)."
Kimera并没有使用像ElasticFusion、BundleFusion一样使用一个单一的数值来描述其建图的准确度和完整度,而是使用热图表达形式进行定性地展示。
[11] Rosinol, Antoni. Densifying sparse vio: a mesh-based approach using structural regularities. MS thesis. ETH Zurich; Massachusetts Institute of Technology (MIT), 2018.
评估方式:1+4
准确度:
"With the newly registered point cloud, we can compute a cloud to cloud distance to assess the accuracy of the mesh relative to the ground-truth point cloud. Used Approach We compute the cloud to cloud absolute distance using the nearest neighbour distance. For each point of the estimated cloud from the mesh, we search the nearest point in the reference cloud and compute their Euclidean distance."
完整性:
"Similarly to the accuracy, we define the completeness of the mesh as the percentage of points within a threshold of the ground truth."
Kimera的这两项指标计算也是参考了这篇论文。
[12] Dou, Mingsong, et al. "Fusion4d: Real-time performance capture of challenging scenes." ACM Transactions on Graphics (TOG) 35.4 (2016): 1-13.
评估方式:3+6
"In Fig. 15, we compare to the dataset of [Collet et al. 2015] for a sequence with extremely high motions. The figure compares renderings of the original meshes and multiple reconstructions, where red corresponds to a fitting error of 15mm. In particular, we compare our method with [Zollhöfer et al. 2014] and [Newcombe et al. 2015], showing our superior reconstructions in these challenging situations."
使用Hausdorff distance作为距离度量.
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。