我家自动化

这个屌丝很懒，什么也没留下！

热门标签

[论文阅读 2019 ICCV-oral 目标跟踪]Learning Discriminative Model Prediction for Tracking_目标跟踪online training

作者：我家自动化 | 2024-08-21 06:26:03

踩

目标跟踪online training

简介

paper:Learning Discriminative Model Prediction for Tracking

code:visionml/pytracking

参考:看懂这篇视觉跟踪算法你就可以超神了

Martin持续发力的经典之作Dimp。这篇论文的动机是：首先，当前的siamese跟踪器只重视target的特征而忽略了background信息；其次，当前的siamese跟踪器是离线训练的，而跟踪任务跟踪的目标大部分情况是训练集中所没有的，这就导致算法的在线跟踪时的不可靠；同时，当前大多数SOTA的跟踪器都采用很简单的模板更新策略，导致算法是不鲁棒的。

基于当前跟踪器存在的问题，这篇论文在ATOM的基础上提出了Dimp，一个更加强大的跟踪模型!

如下图所示，近些年来流行的siamese结构算法通常只使用target feature(crop处理)且通常不用online train.

在这里插入图片描述

主要内容

在这里插入图片描述

Dimp主要在ATOM的基础上对target classification部分进行了改进优化，如上图所示是Dimp的target classification部分.

Discriminative Learning Loss

target classification中的关键是Model predictor D,而Model predictor D采用online train的方式来更新从而使得模型更加可靠。

在这里插入图片描述

为此，对于target classification部分，论文提出了以下损失函数:

$L(f)=\frac{1}{\left|S_{\text {train }}\right|} \sum_{(x, c) \in S_{\text {train }}}\|r(x * f, c)\|^{2}+\|\lambda f\|^{2}$

where $f=D(S_{train})$ , $*$ denotes convolution and $λ$ is a regularization factor.The function $r (s, c)$ computes the residual at every spatial location based on the target confidence scores $s = x * f$ and
the ground-truth target center coordinate $c$ .

对于残差函数 $r (x * f, c)$ 的选择，论文中认为采用简单的 $r(x*f,c)=x*f-y_c$ (where $y_c$ are the desired target scores as at each location,popularly set to a Gaussian function centered at $c$ ),这样简单的残差，使得模型将关注重点放在负样本上（因为高斯标签只有少部分是值较大的），而导致学习到的不是最佳模型。

为此，这篇论文从SVM中收到启发，在残差函数中使用hinge-like loss,定义的残差函数如下:

$c)=v_{c} \cdot\left(m_{c} s+\left(1-m_{c}\right) \max (0, s)-y_{c}\right)$

Here, the target mask $m_c$ , the spatial weight $v_c$ , the regularization factor $λ$ , and the regression target $y_c$

其中, $m_c$ , $v_c$ , $\lambda$ 和 $y_c$ 这些参数都是可以通过在线学习得到，在论文的 $3.4$ 节有详细描述，之后我也会稍微介绍一下.

Optimization-Based Architecture

前面我们已经介绍了target classification的损失函数，通过最小化这个损失函数就可以得到最优的filter f.

最直接的优化损失函数的方法就是采用梯度下降法，用公式可以表示为:

$f^{(i+1)}=f^{(i)}-\alpha \nabla L\left(f^{(i)}\right)$

Martin大神认为采用梯度下降法会使得模型收敛很慢，而收敛很慢的原因是梯度下降中采用了固定的步长，而不是根据当前数据或模型评估结果进行动态调整。为此，Martin大神通过最速梯度算法来迭代优化得到一个比较理想的filter f.(具体见论文3.2)

Initial Filter Prediction

在Model predictor D中还有一个Model initialier模块，这个模块由一个卷积层后面紧跟一个precise ROI pooling结构组成，这个模块仅负责提供合理的初始估计值，而不是预测最终模型，最终模型由Model optimizer提供.

However, rather than predicting the final model, our initializer network is tasked with only providing a reasonable initial estimate, which is then processed by the optimizer module to provide the final model.

在这里插入图片描述

Learning the Discriminative Learning Loss

前面我们提到在残差函数 $r (s, c)$ 中 $m_c$ , $v_c$ , $\lambda$ 和 $y_c$ 都是可以通过学习得到的，而在之前的跟踪算法中这些一般都是人为设计好的。

这篇论文以回归目标 $y_c$ 为例进行了说明，一般情况下 $y_c$ 会认为设置为高斯函数型的标签，这篇论文将其定义为如下：

$y_{c}(t)=\sum_{k=0}^{N-1} \phi_{k}^{y} \rho_{k}(\|t-c\|) .$

$\rho_{k}(d)=\left\{max(0,1−|d−kΔ|Δ),k<N−1max(0,min(1,1+d−kΔΔ)),k=N−1$

max (0, 1 - | d - k Δ | Δ), max (0, min (1, 1 + d - k Δ Δ)), k < N - 1 k = N - 1

$\begin{array}{ll} \max \left(0,1-\frac{|d-k \Delta|}{\Delta}\right), & k<N-1 \\ \max \left(0, \min \left(1,1+\frac{d-k \Delta}{\Delta}\right)\right), & k=N-1 \end{array}$ \right.

ρ_{k} (d) = {max (0, 1 - \frac{∣ d - k Δ ∣}{Δ}), max (0, min (1, 1 + \frac{d - k Δ}{Δ})), k < N - 1 k = N - 1

其中 $\rho_{k}(d)$ , $\|t-c||$ 都是计算得到的实际值，真正需要学习的参数只有 $\phi_{k}^{y}$ ,简单说就是通过训练学习一个比较理想的参数 $\phi_{k}^{y}$ .

Bounding Box Estimation

在这里插入图片描述

对于target estimation部分则采用ATOM中的Iou-Net结果,如上图所示(具体可以参考ATOM)。

Offline Training

离线训练时，将Feature extractor部分，target classification部分和target estimation部分当成一个整体进行训练。

其中target classification部分的分类损失函数定义如下(具体参考原文):

$L_{\mathrm{cls}}=\frac{1}{N_{\mathrm{iter}}} \sum_{i=0}^{N_{\mathrm{iter}}} \sum_{(x, c) \in S_{\mathrm{test}}}\left\|\ell\left(x * f^{(i)}, z_{c}\right)\right\|^{2}$

$\ell(s, z)=\left\{s−z,z>Tmax(0,s),z≤T$

s - z, max (0, s), z > T z \leq T

$\begin{array}{ll} s-z, & z>T \\ \max (0, s), & z \leq T \end{array}$ \right.

ℓ (s, z) = {s - z, max (0, s), z > T z \leq T

而对于target estimation部分则采用预测的bbox与ground truth之间的IOU loss.

最终总损失定义为:

$L_{\mathrm{tot}}=\beta L_{\mathrm{cls}}+L_{\mathrm{bb}}$

Online Tracking

在线跟踪时，首先通过target classification对目标中心进行定位;之后，通过target estimation预测目标的bbox。

实验结果

在这里插入图片描述

小结

Martin大神的经典神作，需要好好专研，不得不佩服Martin大神的数学功底，tql!

声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/w/我家自动化/article/detail/1010541