赞
踩
To address this problem, we propose to refine those poses during training through rotation and translation/scale optimization.
To soften the effect of the low texture, we combine the global reasoning of vision transformers with an overfitting-aware, iterative self-distillation mechanism, providing more accurate depth guidance coming from the network itself.
Experiments on NYUv2, ScanNet, 7scenes, and KITTI datasets support the effectiveness of each component in our framework, which sets a new state-of-the-art for indoor self-supervised monocular depth estimation, as well as outstanding generalization ability. Code and models are available at https://github.com/zxcqlf/GasMono
为了解决这个问题,作者提出在训练过程中通过旋转和平移/尺度优化来优化这些姿态。
为了减轻低纹理的影响,作者将视觉变换器的全局推理与一种过拟合感知、迭代自蒸馏机制相结合,提供更准确的深度指导,来自网络本身。
在NYUv2、ScanNet、7scenes和KITTI数据集上的实验证明了我们框架中每个组件的有效性,该框架在室内自监督单目深度估计领域取得了新的最先进水平,并具有出色的泛化能力。代码和模型可在https://github.com/zxcqlf/GasMono 上找到。
To alleviate this issue, we propose Noise2Info to extract the critical information, the standard deviation \sigma_n of injected noise, only based on the noisy images. Specifically, we first theoretically provide an upper bound on \sigma_n, while the bound requires clean images. Then, we propose a novel method to estimate the bound of \sigma_n by only using noisy images. Besides, we prove that the difference between our estimation with the true deviation goes smaller as the model training. Empirical studies show that Noise2Info is effective and robust on benchmark data sets and closely estimates the standard deviation of noises during model training.
为了解决这个问题,提出了Noise2Info来仅基于嘈杂图像提取关键信息——注入噪声的标准差σ_n。具体地,首先在理论上提供了σ_n的上限,但这个上限需要干净图像。然后,提出了一种新颖的方法,通过仅使用嘈杂图像来估计σ_n的上限。此外,证明了随着模型训练,我们的估计与真实偏差之间的差异会变得更小。实证研究表明,Noise2Info在基准数据集上是有效且稳健的,并且在模型训练过程中能够准确估计噪声的标准差。
Requiring such knowledge is the main limitation of SSL and is often tackled by ad-hoc strategies e.g. applying known data-augmentations to the same input.
In this work, we generalize and formalize this principle through Positive Active Learning (PAL) where an oracle queries semantic relationships between samples.
PAL achieves three main objectives. First, it is a theoretically grounded learning framework that encapsulates standard SSL but also supervised and semi-supervised learning depending on the employed oracle.
Second, it provides a consistent algorithm to embed a priori knowledge, e.g. some observed labels, into any SSL losses without any change in the training pipeline.
Third, it provides a proper active learning framework yielding low-cost solutions to annotate datasets, arguably bringing the gap between theory and practice of active learning that is based on simple-to-answer-by-non-experts queries of semantic relationships between inputs.
在这项工作中,作者通过积极正学习(PAL)来推广和正式化这一原则,其中一个oracle查询样本之间的语义关系。
PAL实现了三个主要目标。首先,它是一个理论上基础的学习框架,封装了标准的SSL,同时根据所使用的oracle,也包括监督和半监督学习。其次,它提供了一致的算法,将先验知识(例如一些观察到的标签)嵌入到任何SSL损失中,而无需改变训练流程。第三,它提供了一个适当的主动学习框架,为标注数据集提供低成本解决方案,可以说是缩小了基于对输入之间的语义关系进行简单回答的非专家查询的主动学习的理论和实践之间的差距。
Our framework consists of three pre-training stages at different levels:
该框架包括三个不同级别的预训练阶段:
1)图像级预训练阶段,全局地将镜像反射特征纳入预训练模型;
2)补丁级预训练阶段,从图像补丁中模拟和学习局部镜像反射;
3)像素级预训练阶段,通过基于镜子内外关系重建受损镜像来像素地捕获镜像反射。
大量实验证明,作者的SSL预训练框架明显优于先前最先进的基于CNN的SSL预训练框架,甚至在转移到镜像检测任务时也优于监督的ImageNet预训练。
代码和模型可在https://jiaying.link/iccv2023-sslmirror/获得。
the feature space indiscriminately. In this study, we introduce feature-level augmentation and propose a novel semantics-consistent feature search (SCFS) method to mitigate this negative effect. The main idea of SCFS is to adaptively
search semantics-consistent features to enhance the contrast between semantics-consistent regions in different augmentations. Thus, the trained model can learn to focus on meaningful object regions, improving the semantic representation ability. Extensive experiments conducted on different datasets and tasks demonstrate that SCFS effectively improves the performance of self-supervised learning and achieves state-of-the-art performance on different downstream tasks.
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。