[1] Localization Distillation for Dense Object Detection(密集对象检测的定位蒸馏)

keywords: Bounding Box Regression, Localization Quality Estimation, Knowledge Distillation

paper | code

解读：南开程明明团队和天大提出LD：目标检测的定位蒸馏

视频目标检测(Video Object Detection)

[1] Unsupervised Activity Segmentation by Joint Representation Learning and Online Clustering(通过联合表示学习和在线聚类进行无监督活动分割)

paper | video

3D目标检测(3D object detection)

[21] CAT-Det: Contrastively Augmented Transformer for Multi-modal 3D Object Detection(用于多模态 3D 对象检测的对比增强transformer)

paper

[20] Forecasting from LiDAR via Future Object Detection(通过未来目标检测从 LiDAR 进行预测)

paper | code

[15] Point2Seq: Detecting 3D Objects as Sequences(将 3D 对象检测为序列)

paper | code

[14] MonoDETR: Depth-aware Transformer for Monocular 3D Object Detection(用于单目 3D 对象检测的深度感知transformer)

paper | code

[13] TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers(用于 3D 对象检测的稳健 LiDAR-Camera Fusion 与 Transformer)

paper | code

[12] Not All Points Are Equal: Learning Highly Efficient Point-based Detectors for 3D LiDAR Point Clouds(学习用于 3D LiDAR 点云的高效基于点的检测器)

paper | code

[11] Sparse Fuse Dense: Towards High Quality 3D Detection with Depth Completion(迈向具有深度完成的高质量 3D 检测)

paper

[10] MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer(使用深度感知 Transformer 的单目 3D 对象检测)

paper | code

[9] Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds(从点云进行 3D 对象检测的 Set-to-Set 方法)

paper | code

[8] VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial Attention

paper | code

[7] MonoJSG: Joint Semantic and Geometric Cost Volume for Monocular 3D Object Detection(单目 3D 目标检测的联合语义和几何成本量)

paper | code

[6] DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection(用于多模态 3D 目标检测的激光雷达相机深度融合)

paper | code

[5] Point Density-Aware Voxels for LiDAR 3D Object Detection(用于 LiDAR 3D 对象检测的点密度感知体素)

paper | code

[4] Back to Reality: Weakly-supervised 3D Object Detection with Shape-guided Label Enhancement(带有形状引导标签增强的弱监督 3D 对象检测)

paper | code

[3] Canonical Voting: Towards Robust Oriented Bounding Box Detection in 3D Scenes(在 3D 场景中实现稳健的定向边界框检测)

paper | code

[2] A Versatile Multi-View Framework for LiDAR-based 3D Object Detection with Guidance from Panoptic Segmentation(在全景分割的指导下，用于基于 LiDAR 的 3D 对象检测的多功能多视图框架)

keywords: 3D Object Detection with Point-based Methods, 3D Object Detection with Grid-based Methods, Cluster-free 3D Panoptic Segmentation, CenterPoint 3D Object Detection

paper

[1] Pseudo-Stereo for Monocular 3D Object Detection in Autonomous Driving(自动驾驶中用于单目 3D 目标检测的伪立体)

keywords: Autonomous Driving, Monocular 3D Object Detection

paper | code

人物交互检测(HOI Detection)

[2] MSTR: Multi-Scale Transformer for End-to-End Human-Object Interaction Detection(用于端到端人-物交互检测的多尺度 Transformer)

paper

[1] Efficient Two-Stage Detection of Human-Object Interactions with a Novel Unary-Pairwise Transformer(使用新型一元对变换器的人与物体交互的两阶段检测)

paper | project

伪装目标检测(Camouflaged Object Detection)

[2] Implicit Motion Handling for Video Camouflaged Object Detection(视频伪装对象检测的隐式运动处理)

paper | dataset

[1] Zoom In and Out: A Mixed-scale Triplet Network for Camouflaged Object Detection(放大和缩小：用于伪装目标检测的混合尺度三元组网络)

paper | code

旋转目标检测(Rotation Object Detection)

显著性目标检测(Saliency Object Detection)

[2] Bi-directional Object-context Prioritization Learning for Saliency Ranking(显着性排名的双向对象上下文优先级学习)

paper | code

[1] Democracy Does Matter: Comprehensive Feature Mining for Co-Salient Object Detection()

paper

关键点检测(Keypoint Detection)

[1] UKPGAN: A General Self-Supervised Keypoint Detector(一个通用的自监督关键点检测器)

paper | code

车道线检测(Lane Detection)

[2] CLRNet: Cross Layer Refinement Network for Lane Detection(用于车道检测的跨层细化网络)

paper

[1] Rethinking Efficient Lane Detection via Curve Modeling(通过曲线建模重新思考高效车道检测)

keywords: Segmentation-based Lane Detection, Point Detection-based Lane Detection, Curve-based Lane Detection, autonomous driving

paper | code

边缘检测(Edge Detection)

[1] EDTER: Edge Detection with Transformer(使用transformer的边缘检测)

paper | code

消失点检测(Vanishing Point Detection)

[1] Deep vanishing point detection: Geometric priors make dataset variations vanish(深度消失点检测：几何先验使数据集变化消失)

paper | code

异常检测(Anomaly Detection)

[5] Catching Both Gray and Black Swans: Open-set Supervised Anomaly Detection(捕捉灰天鹅和黑天鹅：开放集监督异常检测)

paper | code

[4] UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection(监督开放集视频异常检测的新基准)

paper | code

[3] ViM: Out-Of-Distribution with Virtual-logit Matching(具有虚拟 logit 匹配的分布外)(OOD检测)

paper | code

[2] Generative Cooperative Learning for Unsupervised Video Anomaly Detection(用于无监督视频异常检测的生成式协作学习)

paper

[1] Self-Supervised Predictive Convolutional Attentive Block for Anomaly Detection(用于异常检测的自监督预测卷积注意力块)(论文暂未上传)

paper | code

分割(Segmentation)

图像分割(Image Segmentation)

[5] Progressive Minimal Path Method with Embedded CNN(具有嵌入式 CNN 的渐进最小路径方法)

paper

[4] Revisiting Near/Remote Sensing with Geospatial Attention(用地理空间注意力重新审视近/遥感)

paper

[3] Learning What Not to Segment: A New Perspective on Few-Shot Segmentation(学习不分割的内容：关于小样本分割的新视角)

paper | code

[2] CRIS: CLIP-Driven Referring Image Segmentation(CLIP 驱动的参考图像分割)

paper

[1] Hyperbolic Image Segmentation(双曲线图像分割)

paper

全景分割(Panoptic Segmentation)

[2] Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers(使用 Transformers 深入研究全景分割)

paper | code

[1] Bending Reality: Distortion-aware Transformers for Adapting to Panoramic Semantic Segmentation(弯曲现实：适应全景语义分割的失真感知Transformer)

keywords: Semantic- and panoramic segmentation, Unsupervised domain adaptation, Transformer

paper | code

语义分割(Semantic Segmentation)

[21] FIFO: Learning Fog-invariant Features for Foggy Scene Segmentation(学习雾景分割的雾不变特征)(Oral)

paper | project

[20] WildNet: Learning Domain Generalized Semantic Segmentation from the Wild(从野外学习领域广义语义分割)

paper | code

[19] Rethinking Semantic Segmentation: A Prototype View(重新思考语义分割：原型视图)(Oral)

paper | code

[18] DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation(改进域自适应语义分割的网络架构和训练策略)

paper | code

[17] Towards Fewer Annotations: Active Learning via Region Impurity and Prediction Uncertainty for Domain Adaptive Semantic Segmentation(朝向更少的注释：通过区域不纯度和预测不确定性进行域自适应语义分割的主动学习)

paper | code

[16] Perturbed and Strict Mean Teachers for Semi-supervised Semantic Segmentation(半监督语义分割的扰动和严格均值教师)

paper

[15] Class-Balanced Pixel-Level Self-Labeling for Domain Adaptive Semantic Segmentation(用于域自适应语义分割的类平衡像素级自标记)

paper | code

[14] Regional Semantic Contrast and Aggregation for Weakly Supervised Semantic Segmentation(弱监督语义分割的区域语义对比和聚合)

paper | code

[13] Tree Energy Loss: Towards Sparsely Annotated Semantic Segmentation(走向稀疏注释的语义分割)

paper | code

[12] Scribble-Supervised LiDAR Semantic Segmentation

paper |code

[11] ADAS: A Direct Adaptation Strategy for Multi-Target Domain Adaptive Semantic Segmentation(多目标域自适应语义分割的直接适应策略)

paper

[10] Weakly Supervised Semantic Segmentation by Pixel-to-Prototype Contrast(通过像素到原型对比的弱监督语义分割)

paper

[9] Representation Compensation Networks for Continual Semantic Segmentation(连续语义分割的表示补偿网络)

paper | code

[8] Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels(使用不可靠伪标签的半监督语义分割)

paper | code | project

[7] Weakly Supervised Semantic Segmentation using Out-of-Distribution Data(使用分布外数据的弱监督语义分割)

paper | code

[6] Self-supervised Image-specific Prototype Exploration for Weakly Supervised Semantic Segmentation(弱监督语义分割的自监督图像特定原型探索)

paper | code

[5] Multi-class Token Transformer for Weakly Supervised Semantic Segmentation(用于弱监督语义分割的多类token Transformer)

paper | code

[4] Cross Language Image Matching for Weakly Supervised Semantic Segmentation(用于弱监督语义分割的跨语言图像匹配)

paper

[3] Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers(从注意力中学习亲和力：使用 Transformers 的端到端弱监督语义分割)

paper | code

[2] ST++: Make Self-training Work Better for Semi-supervised Semantic Segmentation(让自我训练更好地用于半监督语义分割)

keywords: Semi-supervised learning, Semantic segmentation, Uncertainty estimation

paper | code

[1] Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation(弱监督语义分割的类重新激活图)

paper | code

实例分割(Instance Segmentation)

[12] Sparse Object-level Supervision for Instance Segmentation with Pixel Embeddings(具有像素嵌入的实例分割的稀疏对象级监督)

paper | code

[11] Relieving Long-tailed Instance Segmentation via Pairwise Class Balance(通过 Pairwise Class Balance 减轻长尾实例分割)

paper | code

[10] Beyond Semantic to Instance Segmentation: Weakly-Supervised Instance Segmentation via Semantic Knowledge Transfer and Self-Refinement(超越语义到实例分割：通过语义知识转移和自我完善的弱监督实例分割)

paper | code

[9] Noisy Boundaries: Lemon or Lemonade for Semi-supervised Instance Segmentation?(嘈杂的边界：半监督实例分割的柠檬还是柠檬水？)

paper

[8] SharpContour: A Contour-based Boundary Refinement Approach for Efficient and Accurate Instance Segmentation(一种用于高效准确实例分割的基于轮廓的边界细化方法)

paper | project

[7] Sparse Instance Activation for Real-Time Instance Segmentation(实时实例分割的稀疏实例激活)

paper | code

[6] Mask Transfiner for High-Quality Instance Segmentation(用于高质量实例分割的 Mask Transfiner)

paper | code

[5] ContrastMask: Contrastive Learning to Segment Every Thing(对比学习分割每件事)

paper

[4] Discovering Objects that Can Move(发现可以移动的物体)

paper | code

[3] E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation(一种基于端到端轮廓的高质量高速实例分割方法)

paper | code

[2] Efficient Video Instance Segmentation via Tracklet Query and Proposal(通过 Tracklet Query 和 Proposal 进行高效的视频实例分割)

paper

[1] SoftGroup for 3D Instance Segmentation on Point Clouds(用于点云上的 3D 实例分割)

keywords: 3D Vision, Point Clouds, Instance Segmentation

paper | code

超像素(Superpixel)

视频目标分割(Video Object Segmentation)

[1] Language as Queries for Referring Video Object Segmentation(语言作为引用视频对象分割的查询)

paper | code

抠图(Matting)

密集预测(Dense Prediction)

[1] DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting(具有上下文感知提示的语言引导密集预测)

paper | code

视频处理(Video Processing)

[5] Bringing Old Films Back to Life(让老电影焕然一新)

paper | code

[4] Time Lens++: Event-based Frame Interpolation with Parametric Non-linear Flow and Multi-scale Fusion(具有参数非线性流和多尺度融合的基于事件的帧插值)

paper | project | video | dataset

[3] Long-term Video Frame Interpolation via Feature Propagation(通过特征传播的长期视频帧插值)

paper

[2] Unifying Motion Deblurring and Frame Interpolation with Events(将运动去模糊和帧插值与事件统一起来)

paper

[1] Neural Compression-Based Feature Learning for Video Restoration(用于视频复原的基于神经压缩的特征学习)

paper

视频编辑(Video Editing)

[1] M3L: Language-based Video Editing via Multi-Modal Multi-Level Transformers(M3L：通过多模式多级transformer进行基于语言的视频编辑)

paper

视频生成/视频合成(Video Generation/Video Synthesis)

[2] Depth-Aware Generative Adversarial Network for Talking Head Video Generation(用于说话头视频生成的深度感知生成对抗网络)

paper | code

[1] Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning(告诉我什么并告诉我如何：通过多模式调节进行视频合成)

paper | code

视频超分(Video Super-Resolution)

[1] Reference-based Video Super-Resolution Using Multi-Camera Video Triplets(使用多摄像机视频三元组的基于参考的视频超分辨率)

paper | code

估计(Estimation)

光流/运动估计(Optical Flow/Motion Estimation)

[2] Global Matching with Overlapping Attention for Optical Flow Estimation(具有重叠注意力的全局匹配光流估计)

paper | code

[1] CamLiFlow: Bidirectional Camera-LiDAR Fusion for Joint Optical Flow and Scene Flow Estimation(用于联合光流和场景流估计的双向相机-LiDAR 融合)

paper

深度估计(Depth Estimation)

[17] Degradation-agnostic Correspondence from Resolution-asymmetric Stereo(来自分辨率非对称立体声的与退化无关的对应)

paper

[16] P3Depth: Monocular Depth Estimation with a Piecewise Planarity Prior(具有分段平面先验的单目深度估计)

paper | code

[15] Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry(通过融合单视图深度概率与多视图几何进行多视图深度估计)(Oral)

paper | code

[14] Learning Structured Gaussians to Approximate Deep Ensembles(学习结构化高斯函数以逼近深度集成)

paper

[13] LGT-Net: Indoor Panoramic Room Layout Estimation with Geometry-Aware Transformer Network(具有几何感知变压器网络的室内全景房间布局估计)(布局估计)

paper | code

[12] Practical Stereo Matching via Cascaded Recurrent Network with Adaptive Correlation(基于自适应相关的级联循环网络的实用立体匹配)

paper | project

[11] Depth Estimation by Combining Binocular Stereo and Monocular Structured-Light(结合双目立体和单目结构光的深度估计)

paper | code

[10] RGB-Depth Fusion GAN for Indoor Depth Completion(用于室内深度完成的 RGB 深度融合 GAN)

paper

[9] Revisiting Domain Generalized Stereo Matching Networks from a Feature Consistency Perspective(从特征一致性的角度重新审视域广义立体匹配网络)

paper

[8] Deep Depth from Focus with Differential Focus Volume(具有不同焦点体积的焦点深度)

paper

[7] ChiTransformer:Towards Reliable Stereo from Cues(从线索走向可靠的立体声)

paper

[6] Rethinking Depth Estimation for Multi-View Stereo: A Unified Representation and Focal Loss(重新思考多视图立体的深度估计：统一表示和焦点损失)

paper | code

[5] ITSA: An Information-Theoretic Approach to Automatic Shortcut Avoidance and Domain Generalization in Stereo Matching Networks(立体匹配网络中自动避免捷径和域泛化的信息论方法)

keywords: Learning-based Stereo Matching Networks, Single Domain Generalization, Shortcut Learning

paper

[4] Attention Concatenation Volume for Accurate and Efficient Stereo Matching(用于精确和高效立体匹配的注意力连接体积)

keywords: Stereo Matching, cost volume construction, cost aggregation

paper | code

[3] Occlusion-Aware Cost Constructor for Light Field Depth Estimation(光场深度估计的遮挡感知成本构造函数)

paper | [code](https://github.com/YingqianWang/OACC- Net)

[2] NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation(用于单目深度估计的神经窗口全连接 CRF)

keywords: Neural CRFs for Monocular Depth

paper

[1] OmniFusion: 360 Monocular Depth Estimation via Geometry-Aware Fusion(通过几何感知融合进行 360 度单目深度估计)

keywords: monocular depth estimation(单目深度估计),transformer

paper

人体解析/人体姿态估计(Human Parsing/Human Pose Estimation)

[11] PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision(自我监督下共同进化的 3D 人体姿势估计、模仿和幻觉)

paper | code

[10] Learning to Estimate Robust 3D Human Mesh from In-the-Wild Crowded Scenes(从野外拥挤的场景中学习估计稳健的 3D 人体网格)

paper | code

[9] Ray3D: ray-based 3D human pose estimation for monocular absolute 3D localization(用于单目绝对 3D 定位的基于射线的 3D 人体姿态估计)

paper | code

[8] Capturing Humans in Motion: Temporal-Attentive 3D Human Pose and Shape Estimation from Monocular Video(捕捉运动中的人类：来自单目视频的时间注意 3D 人体姿势和形状估计)

paper | video

[7] Physical Inertial Poser (PIP): Physics-aware Real-time Human Motion Tracking from Sparse Inertial Sensors(来自稀疏惯性传感器的物理感知实时人体运动跟踪)

paper | project

[6] Distribution-Aware Single-Stage Models for Multi-Person 3D Pose Estimation(用于多人 3D 姿势估计的分布感知单阶段模型)

paper

[5] MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation(用于 3D 人体姿势估计的多假设transformer)

paper | code

[4] CDGNet: Class Distribution Guided Network for Human Parsing(用于人类解析的类分布引导网络)

paper

[3] Forecasting Characteristic 3D Poses of Human Actions(预测人类行为的特征 3D 姿势)

paper | project | video

[2] Learning Local-Global Contextual Adaptation for Multi-Person Pose Estimation(学习用于多人姿势估计的局部-全局上下文适应)

keywords:Top-Down Pose Estimation(从上至下姿态估计), Limb-based Grouping, Direct Regression

paper

[1] MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video(用于视频中 3D 人体姿势估计的 Seq2seq 混合时空编码器)

keywords：3D Human Pose Estimation, Transformer

paper

手势估计(Gesture Estimation)

[1] ArtiBoost: Boosting Articulated 3D Hand-Object Pose Estimation via Online Exploration and Synthesis(通过在线探索和合成提升关节式 3D 手对象姿势估计)

paper | code

图像处理(Image Processing)

超分辨率(Super Resolution)

[10] High-Resolution Image Harmonization via Collaborative Dual Transformations(通过协作双变换实现高分辨率图像协调)

paper | code

[9] Deep Constrained Least Squares for Blind Image Super-Resolution(用于盲图像超分辨率的深度约束最小二乘)

paper

[8] Local Texture Estimator for Implicit Representation Function(隐式表示函数的局部纹理估计器)

paper

[7] A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-resolution(一种用于空间变形鲁棒场景文本图像超分辨率的文本注意网络)

paper | code

[6] Details or Artifacts: A Locally Discriminative Learning Approach to Realistic Image Super-Resolution(一种真实图像超分辨率的局部判别学习方法)

paper | code

[5] Blind Image Super-resolution with Elaborate Degradation Modeling on Noise and Kernel(对噪声和核进行精细退化建模的盲图像超分辨率)

paper | code

[4] Reflash Dropout in Image Super-Resolution(图像超分辨率中的闪退dropout)

paper

[3] Towards Bidirectional Arbitrary Image Rescaling: Joint Optimization and Cycle Idempotence(迈向双向任意图像缩放：联合优化和循环幂等)

paper

[2] HyperTransformer: A Textural and Spectral Feature Fusion Transformer for Pansharpening(用于全色锐化的纹理和光谱特征融合Transformer)

paper ｜ code

[1] HDNet: High-resolution Dual-domain Learning for Spectral Compressive Imaging(光谱压缩成像的高分辨率双域学习)

keywords: HSI Reconstruction, Self-Attention Mechanism, Image Frequency Spectrum Analysis

paper

图像复原/图像增强/图像重建(Image Restoration/Image Reconstruction)

[7] HyperInverter: Improving StyleGAN Inversion via Hypernetwork(通过超网络改进 StyleGAN 反转)

paper | project

[6] Diverse Plausible 360-Degree Image Outpainting for Efficient 3DCG Background Creation(用于高效 3DCG 背景创建的多样化合理 360 度图像外绘)

paper | project

[5] Exploring and Evaluating Image Restoration Potential in Dynamic Scenes(探索和评估动态场景中的图像复原潜力)

paper

[4] Come-Closer-Diffuse-Faster: Accelerating Conditional Diffusion Models for Inverse Problems through Stochastic Contraction(通过随机收缩加速逆问题的条件扩散模型)

paper

[3] Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction(用于高效高光谱图像重建的掩模引导光谱变换器)

paper | code

[2] Restormer: Efficient Transformer for High-Resolution Image Restoration(用于高分辨率图像复原的高效transformer)

paper | code

[1] Event-based Video Reconstruction via Potential-assisted Spiking Neural Network(通过电位辅助尖峰神经网络进行基于事件的视频重建)

paper

图像去阴影/去反射(Image Shadow Removal/Image Reflection Removal)

图像去噪/去模糊/去雨去雾(Image Denoising)

[6] CVF-SID: Cyclic multi-Variate Function for Self-Supervised Image Denoising by Disentangling Noise from Image(通过从图像中分离噪声的自监督图像去噪的循环多变量函数)

paper | code

[5] Unpaired Deep Image Deraining Using Dual Contrastive Learning(使用双重对比学习的非配对深度图像去雨)

paper | code

[4] AP-BSN: Self-Supervised Denoising for Real-World Images via Asymmetric PD and Blind-Spot Network(通过非对称 PD 和盲点网络对真实世界图像进行自监督去噪)

paper | code

[3] IDR: Self-Supervised Image Denoising via Iterative Data Refinement(通过迭代数据细化的自监督图像去噪)

paper | code

[2] Blind2Unblind: Self-Supervised Image Denoising with Visible Blind Spots(具有可见盲点的自监督图像去噪)

paper | code

[1] E-CIR: Event-Enhanced Continuous Intensity Recovery(事件增强的连续强度恢复)

keywords: Event-Enhanced Deblurring, Video Representation

paper | code

图像编辑/图像修复(Image Edit/Inpainting)

[6] HyperStyle: StyleGAN Inversion with HyperNetworks for Real Image Editing(用于真实图像编辑的超网络 StyleGAN 反演)

paper | project

[5] High-Fidelity GAN Inversion for Image Attribute Editing(用于图像属性编辑的高保真 GAN 反演)

paper | code | project

[4] Style Transformer for Image Inversion and Editing(用于图像反转和编辑的样式transformer)

paper | code

[3] MISF: Multi-level Interactive Siamese Filtering for High-Fidelity Image Inpainting(用于高保真图像修复的多级交互式 Siamese 过滤)

paper | code

[2] HairCLIP: Design Your Hair by Text and Reference Image(通过文本和参考图像设计你的头发)

keywords: Language-Image Pre-Training (CLIP), Generative Adversarial Networks

paper | project

[1] Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding(增量transformer结构增强图像修复与掩蔽位置编码)

keywords: Image Inpainting, Transformer, Image Generation

paper | code

图像翻译(Image Translation)

[5] Maximum Spatial Perturbation Consistency for Unpaired Image-to-Image Translation(未配对图像到图像翻译的最大空间扰动一致性)

paper | code

[4] Globetrotter: Connecting Languages by Connecting Images(通过连接图像连接语言)

paper

[3] QS-Attn: Query-Selected Attention for Contrastive Learning in I2I Translation(图像翻译中对比学习的查询选择注意)

paper | code

[2] FlexIT: Towards Flexible Semantic Image Translation(迈向灵活的语义图像翻译)

paper

[1] Exploring Patch-wise Semantic Relation for Contrastive Learning in Image-to-Image Translation Tasks(探索图像到图像翻译任务中对比学习的补丁语义关系)

keywords: image translation, knowledge transfer,Contrastive learning

paper

图像质量评估(Image Quality Assessment)

风格迁移(Style Transfer)

[5] Pastiche Master: Exemplar-Based High-Resolution Portrait Style Transfer(基于示例的高分辨率肖像风格转移)

paper | code | project

[4] Industrial Style Transfer with Large-scale Geometric Warping and Content Preservation(具有大规模几何变形和内容保留的工业风格迁移)

paper | project | code

[3] Exact Feature Distribution Matching for Arbitrary Style Transfer and Domain Generalization(任意风格迁移和域泛化的精确特征分布匹配)

paper | code

[2] Style-ERD: Responsive and Coherent Online Motion Style Transfer(响应式和连贯的在线运动风格迁移)

paper

[1] CLIPstyler: Image Style Transfer with a Single Text Condition(具有单一文本条件的图像风格转移)

keywords: Style Transfer, Text-guided synthesis, Language-Image Pre-Training (CLIP)

paper

人脸(Face)

[6] ImFace: A Nonlinear 3D Morphable Face Model with Implicit Neural Representations(具有隐式神经表示的非线性 3D 可变形人脸模型)

paper

[5] Cross-Modal Perceptionist: Can Face Geometry be Gleaned from Voices?(跨模态感知者：可以从声音中收集面部几何形状吗？)

paper | project

[4] Portrait Eyeglasses and Shadow Removal by Leveraging 3D Synthetic Data(利用 3D 合成数据去除人像眼镜和阴影)

paper | code

[3] HP-Capsule: Unsupervised Face Part Discovery by Hierarchical Parsing Capsule Network(分层解析胶囊网络的无监督人脸部分发现)

paper

[2] FaceFormer: Speech-Driven 3D Facial Animation with Transformers(FaceFormer：带有transformer的语音驱动的 3D 面部动画)

paper | code

[1] Sparse Local Patch Transformer for Robust Face Alignment and Landmarks Inherent Relation Learning(用于鲁棒人脸对齐和地标固有关系学习的稀疏局部补丁transformer)

paper | code

人脸识别/检测(Facial Recognition/Detection)

[4] DeepFace-EMD: Re-ranking Using Patch-wise Earth Mover's Distance Improves Out-Of-Distribution Face Identification(使用 Patch-wise Earth Mover 的距离重新排序改进了分布外人脸识别)

paper | code

[3] Towards Semi-Supervised Deep Facial Expression Recognition with An Adaptive Confidence Margin(具有自适应置信度的半监督深度面部表情识别)

paper | code

[2] Privacy-preserving Online AutoML for Domain-Specific Face Detection(用于特定领域人脸检测的隐私保护在线 AutoML)

paper

[1] An Efficient Training Approach for Very Large Scale Face Recognition(一种有效的超大规模人脸识别训练方法)

paper | code

人脸生成/合成/重建/编辑(Face Generation/Face Synthesis/Face Reconstruction/Face Editing)

[4] TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable Facial Editing(基于 Transformer 的双空间 GAN 用于高度可控的面部编辑)

paper | code | project

[3] FENeRF: Face Editing in Neural Radiance Fields(神经辐射场中的人脸编辑)

paper | project

[2] GCFSR: a Generative and Controllable Face Super Resolution Method Without Facial and GAN Priors(一种没有面部和 GAN 先验的生成可控人脸超分辨率方法)

paper

[1] Sparse to Dense Dynamic 3D Facial Expression Generation(稀疏到密集的动态 3D 面部表情生成)

keywords: Facial expression generation, 4D face generation, 3D face modeling

paper

人脸伪造/反欺骗(Face Forgery/Face Anti-Spoofing)

[4] Self-supervised Learning of Adversarial Example: Towards Good Generalizations for Deepfake Detection(对抗样本的自监督学习：迈向 Deepfake 检测的良好泛化)

paper | code

[3] Domain Generalization via Shuffled Style Assembly for Face Anti-Spoofing(通过 Shuffled Style Assembly 进行域泛化以进行人脸反欺骗)

paper | code

[2] Voice-Face Homogeneity Tells Deepfake

paper | code

[1] Protecting Celebrities from DeepFake with Identity Consistency Transformer(使用身份一致性转换器保护名人免受 DeepFake 的影响)

paper | code

目标跟踪(Object Tracking)

[9] Unsupervised Learning of Accurate Siamese Tracking(准确连体跟踪的无监督学习)

paper | code

[8] Global Tracking Transformers

paper | code

[7] Transforming Model Prediction for Tracking(转换模型预测以进行跟踪)

paper | code

[6] MixFormer: End-to-End Tracking with Iterative Mixed Attention(具有迭代混合注意力的端到端跟踪)

paper | code

[5] Unsupervised Domain Adaptation for Nighttime Aerial Tracking(夜间空中跟踪的无监督域自适应)

paper | code

[4] Iterative Corresponding Geometry: Fusing Region and Depth for Highly Efficient 3D Tracking of Textureless Objects(迭代对应几何：融合区域和深度以实现无纹理对象的高效 3D 跟踪)

paper | [code](https://github.com/DLR- RM/3DObjectTracking)

[3] TCTrack: Temporal Contexts for Aerial Tracking(空中跟踪的时间上下文)

paper | code

[2] Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds(超越 3D 连体跟踪：点云中 3D 单对象跟踪的以运动为中心的范式)

keywords: Single Object Tracking, 3D Multi-object Tracking / Detection, Spatial-temporal Learning on Point Clouds

paper

[1] Correlation-Aware Deep Tracking(相关感知深度跟踪)

paper

图像&视频检索/视频理解(Image&Video Retrieval/Video Understanding)

[7] Correlation Verification for Image Retrieval(图像检索的相关性验证)(Oral)

paper | code

[6] It's About Time: Analog Clock Reading in the Wild(时间到了：野外模拟时钟读数)

paper | project

[5] Sketching without Worrying: Noise-Tolerant Sketch-Based Image Retrieval(无忧素描：基于素描的抗噪图像检索)

paper | code

[4] Partially Does It: Towards Scene-Level FG-SBIR with Partial Input(走向带有部分输入的场景级 FG-SBIR)

paper

[3] Sketch3T: Test-Time Training for Zero-Shot SBIR(零样本 SBIR 的测试时间训练)

paper

[2] Bridging Video-text Retrieval with Multiple Choice Questions(桥接视频文本检索与多项选择题)

paper | code

[1] BEVT: BERT Pretraining of Video Transformers(视频Transformer的 BERT 预训练)

keywords: Video understanding, Vision transformers, Self-supervised representation learning, BERT pretraining

paper | code

行为识别/动作识别/检测/分割/定位(Action/Activity Recognition)

[18] UnweaveNet: Unweaving Activity Stories(解开活动故事)

paper | [code](https://github.com/willprice/activity- stories)

[17] Dual-AI: Dual-path Action Interaction Learning for Group Activity Recognition(用于群体动作识别的双路径动作交互学习)(Oral)

paper | project

[16] Detector-Free Weakly Supervised Group Activity Recognition(无检测器弱监督群体动作识别)

paper

[15] MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection(用于动作检测的多尺度时间 ConvTransformer)

paper | code

[14] Unsupervised Pre-training for Temporal Action Localization Tasks(时间动作定位任务的无监督预训练)

paper | code

[13] Weakly-Supervised Online Action Segmentation in Multi-View Instructional Videos(多视图教学视频中的弱监督在线动作分割)

paper

[12] How Do You Do It? Fine-Grained Action Understanding with Pseudo-Adverbs(你怎么做呢？使用伪副词进行细粒度的动作理解)

paper

[11] E2(GO)MOTION: Motion Augmented Event Stream for Egocentric Action Recognition(用于以自我为中心的动作识别的运动增强事件流)

paper

[10] Look for the Change: Learning Object States and State-Modifying Actions from Untrimmed Web Videos(寻找变化：从未修剪的网络视频中学习对象状态和状态修改操作)

paper | code

[9] DirecFormer: A Directed Attention in Transformer Approach to Robust Action Recognition(鲁棒动作识别的 Transformer 方法中的定向注意)

paper

[8] Self-supervised Video Transformer(自监督视频transformer)

paper | code

[7] Spatio-temporal Relation Modeling for Few-shot Action Recognition(小样本动作识别的时空关系建模)

paper | code

[6] RCL: Recurrent Continuous Localization for Temporal Action Detection(用于时间动作检测的循环连续定位)

paper

[5] OpenTAL: Towards Open Set Temporal Action Localization(走向开放集时间动作定位)

paper | code

[4] End-to-End Semi-Supervised Learning for Video Action Detection(视频动作检测的端到端半监督学习)

paper

[3] Learnable Irrelevant Modality Dropout for Multimodal Action Recognition on Modality-Specific Annotated Videos(模态特定注释视频上多模态动作识别的可学习不相关模态丢失)

paper

[2] Weakly Supervised Temporal Action Localization via Representative Snippet Knowledge Propagation(通过代表性片段知识传播的弱监督时间动作定位)

paper | code

[1] Colar: Effective and Efficient Online Action Detection by Consulting Exemplars(通过咨询示例进行有效且高效的在线动作检测)

keywords:Online action detection(在线动作检测)

paper

行人重识别/检测(Re-Identification/Detection)

[4] Camera-Conditioned Stable Feature Generation for Isolated Camera Supervised Person Re-IDentification(用于孤立摄像机监督行人重识别的摄像机条件稳定特征生成)

paper | [code](https://github.com/ftd- Wuchao/CCSFG)

[3] Large-Scale Pre-training for Person Re-identification with Noisy Labels(带有噪声标签的人员重新识别的大规模预训练)

paper | code

[2] Part-based Pseudo Label Refinement for Unsupervised Person Re-identification(用于无监督人员重新识别的基于部分的伪标签细化)

paper | code

[1] Cascade Transformers for End-to-End Person Search(用于端到端人员搜索的级联transformer)

paper | code

图像/视频字幕(Image/Video Caption)

[6] Quantifying Societal Bias Amplification in Image Captioning(量化图像字幕中的社会偏见放大)

paper

[5] NOC-REK: Novel Object Captioning with Retrieved Vocabulary from External Knowledge(从外部知识中检索词汇的新颖对象字幕)

paper

[4] SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning(用于视频字幕的具有稀疏注意力的端到端transformer)

paper | code

[3] Open-Domain, Content-based, Multi-modal Fact-checking of Out-of-Context Images via Online Resources(通过在线资源对上下文外图像进行开放域、基于内容、多模式的事实检查)

paper | code

[2] Hierarchical Modular Network for Video Captioning(用于视频字幕的分层模块化网络)

paper | code

[1] X -Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning(使用 Transformer 进行 3D 密集字幕的跨模式知识迁移)
keywords：Image Captioning and Dense Captioning(图像字幕/密集字幕)；Knowledge distillation(知识蒸馏)；Transformer；3D Vision(三维视觉)

paper

医学影像(Medical Imaging)

[8] Self-Supervised Pre-Training of Swin Transformers for 3D Medical Image Analysis(用于 3D 医学图像分析的 Swin Transformers 的自监督预训练)

paper | code

[7] Incremental Cross-view Mutual Distillation for Self-supervised Medical CT Synthesis(用于自监督医学 CT 合成的增量交叉视图相互蒸馏)

paper

[6] DTFD-MIL: Double-Tier Feature Distillation Multiple Instance Learning for Histopathology Whole Slide Image Classification(用于组织病理学全幻灯片图像分类的双层特征蒸馏多实例学习)

paper | code

[5] ACPL: Anti-curriculum Pseudo-labelling for Semi-supervised Medical Image Classification(半监督医学图像分类的反课程伪标签)

paper

[4] Vox2Cortex: Fast Explicit Reconstruction of Cortical Surfaces from 3D MRI Scans with Geometric Deep Neural Networks(使用几何深度神经网络从 3D MRI 扫描中快速显式重建皮质表面)

paper | code

[3] Generalizable Cross-modality Medical Image Segmentation via Style Augmentation and Dual Normalization(通过风格增强和双重归一化的可泛化跨模态医学图像分割)

paper | code

[2] Adaptive Early-Learning Correction for Segmentation from Noisy Annotations(从噪声标签中分割的自适应早期学习校正)

keywords: medical-imaging segmentation, Noisy Annotations

paper | code

[1] Temporal Context Matters: Enhancing Single Image Prediction with Disease Progression Representations(时间上下文很重要：使用疾病进展表示增强单图像预测)

keywords: Self-supervised Transformer, Temporal modeling of disease progression

paper

文本检测/识别/理解(Text Detection/Recognition/Understanding)

[5] Text Spotting Transformers(文本识别transformer)

paper | [code](https://github.com/mlpc- ucsd/TESTR)

[4] Syntax-Aware Network for Handwritten Mathematical Expression Recognition(用于手写数学表达式识别的语法感知网络)

paper

[3] SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition(通过文本检测和文本识别之间更好的协同作用进行场景文本定位)

paper | code

[2] Fourier Document Restoration for Robust Document Dewarping and Recognition(用于鲁棒文档去扭曲和识别的傅里叶文档恢复)

paper | code

[1] XYLayoutLM: Towards Layout-Aware Multimodal Networks For Visually-Rich Document Understanding(迈向布局感知多模式网络，以实现视觉丰富的文档理解)

paper

遥感图像(Remote Sensing Image)

[1] Exploiting Temporal Relations on Radar Perception for Autonomous Driving(利用自动驾驶雷达感知的时间关系)

paper

GAN/生成式/对抗式(GAN/Generative/Adversarial)

[16] GAN-Supervised Dense Visual Alignment(GAN监督的密集视觉对齐)(Oral)

paper | code | project

[15] Towards Robust Rain Removal Against Adversarial Attacks: A Comprehensive Benchmark Analysis and Beyond(迈向强大的雨水清除对抗对抗性攻击：综合基准分析及其他)

paper | code

[14] Understanding and Increasing Efficiency of Frank-Wolfe Adversarial Training(了解 Frank-Wolfe 对抗训练并提高效率)

paper | code

[13] Feature Statistics Mixing Regularization for Generative Adversarial Networks(生成对抗网络的特征统计混合正则化)

paper | code

[12] Subspace Adversarial Training(子空间对抗训练)

paper | code

[11] DTA: Physical Camouflage Attacks using Differentiable Transformation Network(使用可微变换网络的物理伪装攻击)

paper | code

[10] Improving the Transferability of Targeted Adversarial Examples through Object-Based Diverse Input(通过基于对象的多样化输入提高目标对抗样本的可迁移性)

paper | code

[9] Towards Practical Certifiable Patch Defense with Vision Transformer(使用 Vision Transformer 实现实用的可认证补丁防御)

paper

[8] Few Shot Generative Model Adaption via Relaxed Spatial Structural Alignment(基于松弛空间结构对齐的小样本生成模型自适应)

paper

[7] Enhancing Adversarial Training with Second-Order Statistics of Weights(使用权重的二阶统计加强对抗训练)

paper | code

[6] Practical Evaluation of Adversarial Robustness via Adaptive Auto Attack(通过自适应自动攻击对对抗鲁棒性的实际评估)

paper | code1 | code2

[5] Frequency-driven Imperceptible Adversarial Attack on Semantic Similarity(对语义相似性的频率驱动的不可察觉的对抗性攻击)

paper

[4] Shadows can be Dangerous: Stealthy and Effective Physical-world Adversarial Attack by Natural Phenomenon(阴影可能很危险：自然现象的隐秘而有效的物理世界对抗性攻击)

paper

[3] Protecting Facial Privacy: Generating Adversarial Identity Masks via Style-robust Makeup Transfer(保护面部隐私：通过风格稳健的化妆转移生成对抗性身份面具)

paper

[2] Adversarial Texture for Fooling Person Detectors in the Physical World(物理世界中愚弄人探测器的对抗性纹理)

paper

[1] Label-Only Model Inversion Attacks via Boundary Repulsion(通过边界排斥的仅标签模型反转攻击)

paper

图像生成/图像合成(Image Generation/Image Synthesis)

[13] Exemplar-bsaed Pattern Synthesis with Implicit Periodic Field Network(具有隐式周期场网络的示例模式合成)

paper

[12] Styleformer: Transformer based Generative Adversarial Networks with Style Vector(具有样式向量的基于 Transformer 的生成对抗网络)

paper | code

[11] Modulated Contrast for Versatile Image Synthesis(用于多功能图像合成的调制对比度)

paper | code

[10] Attribute Group Editing for Reliable Few-shot Image Generation(属性组编辑用于可靠的小样本图像生成)

paper | code

[9] Text to Image Generation with Semantic-Spatial Aware GAN(使用语义空间感知 GAN 生成文本到图像)

paper | code

[8] Playable Environments: Video Manipulation in Space and Time(可播放环境：空间和时间的视频操作)

paper | code

[7] FLAG: Flow-based 3D Avatar Generation from Sparse Observations(从稀疏观察中生成基于流的 3D 头像)

paper | project

[6] Dynamic Dual-Output Diffusion Models(动态双输出扩散模型)

paper

[5] Exploring Dual-task Correlation for Pose Guided Person Image Generation(探索姿势引导人物图像生成的双任务相关性)

paper | code

[4] 3D Shape Variational Autoencoder Latent Disentanglement via Mini-Batch Feature Swapping for Bodies and Faces(基于小批量特征交换的三维形状变化自动编码器潜在解纠缠)

paper | code

[3] Interactive Image Synthesis with Panoptic Layout Generation(具有全景布局生成的交互式图像合成)

[paper])(https://arxiv.org/abs/2203.02104)

[2] Polarity Sampling: Quality and Diversity Control of Pre-Trained Generative Networks via Singular Values(极性采样：通过奇异值对预训练生成网络的质量和多样性控制)

paper | demo

[1] Autoregressive Image Generation using Residual Quantization(使用残差量化的自回归图像生成)

paper | code

三维视觉(3D Vision)

[5] Fast Point Transformer

paper | project

[4] Towards Implicit Text-Guided 3D Shape Generation(迈向隐式文本引导的 3D 形状生成)

paper | code

[3] The Neurally-Guided Shape Parser: Grammar-based Labeling of 3D Shape Regions with Approximate Inference(神经引导的形状解析器：具有近似推理的 3D 形状区域的基于语法的标记)

paper | code

[2] Deep 3D-to-2D Watermarking: Embedding Messages in 3D Meshes and Extracting Them from 2D Renderings(在 3D 网格中嵌入消息并从 2D 渲染中提取它们)

paper

[1] X -Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning(使用 Transformer 进行 3D 密集字幕的跨模式知识迁移)
关键词：图像字幕/密集字幕；知识蒸馏；Transformer；三维视觉

paper

点云(Point Cloud)

[14] REGTR: End-to-end Point Cloud Correspondences with Transformers(与 Transformer 的端到端点云匹配)

paper | code

[13] Stratified Transformer for 3D Point Cloud Segmentation(用于 3D 点云分割的分层transformer)

paper | code

[12] AziNorm: Exploiting the Radial Symmetry of Point Cloud for Azimuth-Normalized 3D Perception(利用点云的径向对称性进行方位归一化 3D 感知)

paper | code

[11] WarpingGAN: Warping Multiple Uniform Priors for Adversarial 3D Point Cloud Generation(为对抗性 3D 点云生成扭曲多个均匀先验)

paper | code

[10] IDEA-Net: Dynamic 3D Point Cloud Interpolation via Deep Embedding Alignment(通过深度嵌入对齐的动态 3D 点云插值)

paper | code

[9] No Pain, Big Gain: Classify Dynamic Point Cloud Sequences with Static Models by Fitting Feature-level Space-time Surfaces(没有痛苦，收获很大：通过拟合特征级时空表面，用静态模型对动态点云序列进行分类)

paper | code

[8] AutoGPart: Intermediate Supervision Search for Generalizable 3D Part Segmentation(通用 3D 零件分割的中间监督搜索)
paper

[7] Geometric Transformer for Fast and Robust Point Cloud Registration(用于快速和稳健点云配准的几何transformer)

paper | code

[6] Contrastive Boundary Learning for Point Cloud Segmentation(点云分割的对比边界学习)

paper | code

[5] Shape-invariant 3D Adversarial Point Clouds(形状不变的 3D 对抗点云)

paper | code

[4] ART-Point: Improving Rotation Robustness of Point Cloud Classifiers via Adversarial Rotation(通过对抗旋转提高点云分类器的旋转鲁棒性)

paper

[3] Lepard: Learning partial point cloud matching in rigid and deformable scenes(Lepard：在刚性和可变形场景中学习部分点云匹配)

paper | code

[2] A Unified Query-based Paradigm for Point Cloud Understanding(一种基于统一查询的点云理解范式)

paper

[1] CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding(用于 3D 点云理解的自监督跨模态对比学习)

keywords: Self-Supervised Learning, Contrastive Learning, 3D Point Cloud, Representation Learning, Cross-Modal Learning

paper | code

三维重建(3D Reconstruction)

[17] I M Avatar: Implicit Morphable Head Avatars from Videos(视频中的隐式可变形头部头像)(Oral)

paper | project

[16] BNV-Fusion: Dense 3D Reconstruction using Bi-level Neural Volume Fusion(使用双层神经体积融合的密集 3D 重建)

paper

[15] SelfRecon: Self Reconstruction Your Digital Avatar from Monocular Video(从单目视频自我重建你的数字化身)(Oral)

paper | code

[14] LISA: Learning Implicit Shape and Appearance of Hands(学习手的隐式形状和外观)

paper | project

[13] BARC: Learning to Regress 3D Dog Shape from Images by Exploiting Breed Information(通过利用品种信息学习从图像中回归 3D 狗形状)

paper | code

[12] Uncertainty-Aware Deep Multi-View Photometric Stereo(不确定性感知深度多视图光度立体)

paper

[11] Neural Reflectance for Shape Recovery with Shadow Handling(使用阴影处理进行形状恢复的神经反射)

paper | code

[10] PLAD: Learning to Infer Shape Programs with Pseudo-Labels and Approximate Distributions(学习用伪标签和近似分布推断形状程序)

paper | code

[9] ϕ-SfT: Shape-from-Template with a Physics-Based Deformation Model(具有基于物理的变形模型的模板形状)

paper | code

[8] Input-level Inductive Biases for 3D Reconstruction(用于 3D 重建的输入级归纳偏差)

paper

[7] AutoSDF: Shape Priors for 3D Completion, Reconstruction and Generation(用于 3D 完成、重建和生成的形状先验)

paper | project

[6] Interacting Attention Graph for Single Image Two-Hand Reconstruction(单幅图像双手重建的交互注意力图)

paper | code

[5] OcclusionFusion: Occlusion-aware Motion Estimation for Real-time Dynamic 3D Reconstruction(实时动态 3D 重建的遮挡感知运动估计)

paper | project

[4] Neural RGB-D Surface Reconstruction(神经 RGB-D 表面重建)

paper | project | video

[3] Neural Face Identification in a 2D Wireframe Projection of a Manifold Object(流形对象的二维线框投影中的神经人脸识别)

paper | [code](https://manycore- research.github.io/faceformer) | project

[2] Generating 3D Bio-Printable Patches Using Wound Segmentation and Reconstruction to Treat Diabetic Foot Ulcers(使用伤口分割和重建生成 3D 生物可打印贴片以治疗糖尿病足溃疡)

keywords: semantic segmentation, 3D reconstruction, 3D bio-printers

paper

[1] H4D: Human 4D Modeling by Learning Neural Compositional Representation(通过学习神经组合表示进行人体 4D 建模)

keywords: 4D Representation(4D 表征),Human Body Estimation(人体姿态估计),Fine-grained Human Reconstruction(细粒度人体重建)

paper

场景重建/视图合成/新视角合成(Novel View Synthesis)

[17] RayMVSNet: Learning Ray-based 1D Implicit Fields for Accurate Multi-View Stereo(学习基于光线的 1D 隐式场以实现准确的多视图立体)

paper

[16] Surface-Aligned Neural Radiance Fields for Controllable 3D Human Synthesis(用于可控 3D 人体合成的表面对齐神经辐射场)

paper | project

[15] IRON: Inverse Rendering by Optimizing Neural SDFs and Materials from Photometric Images(通过优化来自光度图像的神经 SDF 和材料进行反向渲染)

paper | project

[14] MonoScene: Monocular 3D Semantic Scene Completion(单目 3D 语义场景完成)

paper | code | project

[13] Stereo Magnification with Multi-Layer Images(具有多层图像的立体放大)

paper | code

[12] Scene Representation Transformer: Geometry-Free Novel View Synthesis Through Set-Latent Scene Representations(通过集合潜在场景表示的无几何新颖视图合成)

paper | project

[11] Neural Rays for Occlusion-aware Image-based Rendering(用于遮挡感知的基于图像的渲染的神经射线)

paper | project | code

[10] Deblur-NeRF: Neural Radiance Fields from Blurry Images(来自模糊图像的神经辐射场)

paper | code

[9] NPBG++: Accelerating Neural Point-Based Graphics(加速基于神经点的图形)

paper | project

[8] PlaneMVS: 3D Plane Reconstruction from Multi-View Stereo(从多视图立体重建 3D 平面)

paper

[7] NeRFusion: Fusing Radiance Fields for Large-Scale Scene Reconstruction(用于大规模场景重建的融合辐射场)

paper

[6] GeoNeRF: Generalizing NeRF with Geometry Priors(用几何先验概括 NeRF)

paper | code

[5] StyleMesh: Style Transfer for Indoor 3D Scene Reconstructions(室内 3D 场景重建的风格转换)

paper | code | project

[4] Look Outside the Room: Synthesizing A Consistent Long-Term 3D Scene Video from A Single Image(向外看：从单个图像合成一致的长期 3D 场景视频)

paper | code | project

[3] Point-NeRF: Point-based Neural Radiance Fields(基于点的神经辐射场)

paper ｜ code |project

[2] CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields(文本和图像驱动的神经辐射场操作)

keywords: NeRF, Image Generation and Manipulation, Language-Image Pre-Training (CLIP)

paper | code

[1] Point-NeRF: Point-based Neural Radiance Fields(基于点的神经辐射场)

paper | code | project

模型压缩(Model Compression)

知识蒸馏(Knowledge Distillation)

[4] Decoupled Knowledge Distillation(解耦知识蒸馏)

paper | code

[3] Wavelet Knowledge Distillation: Towards Efficient Image-to-Image Translation(小波知识蒸馏：迈向高效的图像到图像转换)

paper

[2] Knowledge Distillation as Efficient Pre-training: Faster Convergence, Higher Data-efficiency, and Better Transferability(知识蒸馏作为高效的预训练：更快的收敛、更高的数据效率和更好的可迁移性)

paper | code

[1] Focal and Global Knowledge Distillation for Detectors(探测器的焦点和全局知识蒸馏)

keywords: Object Detection, Knowledge Distillation

paper | code

剪枝(Pruning)

[2] CHEX: CHannel EXploration for CNN Model Compression(CNN模型压缩的通道探索)

paper | code

[1] Interspace Pruning: Using Adaptive Filter Representations to Improve Training of Sparse CNNs(空间剪枝：使用自适应滤波器表示来改进稀疏 CNN 的训练)

paper

量化(Quantization)

[3] It's All In the Teacher: Zero-Shot Quantization Brought Closer to the Teacher(一切尽在老师身上：零样本量化更贴近老师)(Oral)

paper

[2] Implicit Feature Decoupling with Depthwise Quantization(使用深度量化的隐式特征解耦)

paper

[1] IntraQ: Learning Synthetic Images with Intra-Class Heterogeneity for Zero-Shot Network Quantization(学习具有类内异质性的合成图像以进行零样本网络量化)

paper | code

神经网络结构设计(Neural Network Structure Design)

[2] DyRep: Bootstrapping Training with Dynamic Re-parameterization(使用动态重新参数化的引导训练)

paper | code

[1] BatchFormer: Learning to Explore Sample Relationships for Robust Representation Learning(学习探索样本关系以进行鲁棒表征学习)

keywords: sample relationship, data scarcity learning, Contrastive Self-Supervised Learning, long-tailed recognition, zero-shot learning, domain generalization, self-supervised learning

paper | code

CNN

[5] TVConv: Efficient Translation Variant Convolution for Layout-aware Visual Processing(用于布局感知视觉处理的高效翻译变体卷积)(动态卷积)

paper | code

[4] On the Integration of Self-Attention and Convolution(自注意力和卷积的整合)

paper | code1 | code2

[3] Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs(将内核扩展到 31x31：重新审视 CNN 中的大型内核设计)

paper | code

解读：凭什么 31x31 大小卷积核的耗时可以和 9x9 卷积差不多？

解读：RepLKNet: 大核卷积+结构重参数让CNN再次伟大

[2] DeltaCNN: End-to-End CNN Inference of Sparse Frame Differences in Videos(视频中稀疏帧差异的端到端 CNN 推断)

keywords: sparse convolutional neural network, video inference accelerating

paper

[1] A ConvNet for the 2020s

paper | code

解读：“文艺复兴” ConvNet卷土重来，压过Transformer！FAIR重新设计纯卷积新架构

Transformer

[9] Patch Slimming for Efficient Vision Transformers(高效视觉transformer的补丁瘦身)

paper

[8] CodedVTR: Codebook-based Sparse Voxel Transformer with Geometric Guidance(具有几何制导的基于码本的稀疏体素transformer)

paper

[7] MSG-Transformer: Exchanging Local Spatial Information by Manipulating Messenger Tokens(通过操作信使token交换本地空间信息)

paper | code

[6] BoxeR: Box-Attention for 2D and 3D Transformers(用于 2D 和 3D tranformer的 Box-Attention)

paper | code

[5] Bootstrapping ViTs: Towards Liberating Vision Transformers from Pre-training(引导 ViT：从预训练中解放视觉transformer)

paper | code

[4] Attribute Surrogates Learning and Spectral Tokens Pooling in Transformers for Few-shot Learning

paper | code

[3] NomMer: Nominate Synergistic Context in Vision Transformer for Visual Recognition(在视觉transformer中为视觉识别指定协同上下文)

paper | code

[2] Delving Deep into the Generalization of Vision Transformers under Distribution Shifts(深入研究分布变化下的视觉Transformer的泛化)

keywords: out-of-distribution (OOD) generalization, Vision Transformers

paper | code

[1] Mobile-Former: Bridging MobileNet and Transformer(连接 MobileNet 和 Transformer)

keywords: Light-weight convolutional neural networks(轻量卷积神经网络),Combination of CNN and ViT

paper

图神经网络(GNN)

[2] Improving Subgraph Recognition with Variational Graph Information Bottleneck(利用变分图信息瓶颈改进子图识别)

paper | code

[1] AEGNN: Asynchronous Event-based Graph Neural Networks(基于异步事件的图神经网络)

paper | project

神经网络架构搜索(NAS)

[4] Demystifying the Neural Tangent Kernel from a Practical Perspective: Can it be trusted for Neural Architecture Search without training?(从实用的角度揭开神经切线内核的神秘面纱：无需训练就可以信任神经架构搜索吗？)

paper | code

[3] Training-free Transformer Architecture Search(免训练transformer架构搜索)

paper

[2] Global Convergence of MAML and Theory-Inspired Neural Architecture Search for Few-Shot Learning(MAML 的全局收敛和受理论启发的神经架构搜索以进行 Few-Shot 学习)

paper | code

[1] β-DARTS: Beta-Decay Regularization for Differentiable Architecture Search(可微架构搜索的 Beta-Decay 正则化)

paper

MLP

[4] Brain-inspired Multilayer Perceptron with Spiking Neurons(具有尖峰神经元的类脑多层感知器)

paper | code

[3] Dynamic MLP for Fine-Grained Image Classification by Leveraging Geographical and Temporal Information(利用地理和时间信息进行细粒度图像分类的动态 MLP)

paper | code

[2] Revisiting the Transferability of Supervised Pretraining: an MLP Perspective(重新审视监督预训练的可迁移性：MLP 视角)

paper

[1] An Image Patch is a Wave: Quantum Inspired Vision MLP(图像补丁是波浪：量子启发的视觉 MLP)

paper | code | code

数据处理(Data Processing)

[2] Generating High Fidelity Data from Low-density Regions using Diffusion Models(使用扩散模型从低密度区域生成高保真数据)

paper

[1] Dataset Distillation by Matching Training Trajectories(通过匹配训练轨迹进行数据集蒸馏)(数据集蒸馏)

paper | code | project

数据增广(Data Augmentation)

[3] EnvEdit: Environment Editing for Vision-and-Language Navigation(视觉语言导航的环境编辑)

paper | code

[2] TeachAugment: Data Augmentation Optimization Using Teacher Knowledge(使用教师知识进行数据增强优化)

paper ｜ code

[1] 3D Common Corruptions and Data Augmentation(3D 常见损坏和数据增强)(Oral)

keywords: Data Augmentation, Image restoration, Photorealistic image synthesis

paper | projecr

归一化/正则化(Batch Normalization)

[1] Delving into the Estimation Shift of Batch Normalization in a Network(深入研究网络中批量标准化的估计偏移)

paper | code

图像聚类(Image Clustering)

[1] RAMA: A Rapid Multicut Algorithm on GPU(GPU 上的快速多切算法)

paper | code

图像压缩(Image Compression)

[4] Unified Multivariate Gaussian Mixture for Efficient Neural Image Compression(用于高效神经图像压缩的统一多元高斯混合)

paper | code

[3] ELIC: Efficient Learned Image Compression with Unevenly Grouped Space-Channel Contextual Adaptive Coding(具有不均匀分组的空间通道上下文自适应编码的高效学习图像压缩)

paper

[2] The Devil Is in the Details: Window-based Attention for Image Compression(细节中的魔鬼：图像压缩的基于窗口的注意力)

paper | code

[1] Neural Data-Dependent Transform for Learned Image Compression(用于学习图像压缩的神经数据相关变换)

paper | code | project

模型训练/泛化(Model Training/Generalization)

[11] Parameter-free Online Test-time Adaptation(无参数在线测试时间自适应)(Oral)

paper | code

[10] SNUG: Self-Supervised Neural Dynamic Garments(自我监督的神经动态服装)(Oral)

paper | project

[9] Automated Progressive Learning for Efficient Training of Vision Transformers(用于高效训练视觉transformer的自动渐进式学习)

paper | code

[8] GradViT: Gradient Inversion of Vision Transformers(视觉transformer的梯度反转)

paper | project

[7] Recall@k Surrogate Loss with Large Batches and Similarity Mixup(大批量和相似性混合的 Recall@k 代理损失)

paper

[6] Out-of-distribution Generalization with Causal Invariant Transformations(具有因果不变变换的分布外泛化)

paper

[5] Can Neural Nets Learn the Same Model Twice? Investigating Reproducibility and Double Descent from the Decision Boundary Perspective(神经网络可以两次学习相同的模型吗？从决策边界的角度研究可重复性和双重下降)

paper | code

[4] Towards Efficient and Scalable Sharpness-Aware Minimization(迈向高效和可扩展的锐度感知最小化)

keywords: Sharp Local Minima, Large-Batch Training

paper

[3] CAFE: Learning to Condense Dataset by Aligning Features(通过对齐特征学习压缩数据集)

keywords: dataset condensation, coreset selection, generative models

paper | code

[2] The Devil is in the Margin: Margin-based Label Smoothing for Network Calibration(魔鬼在边缘：用于网络校准的基于边缘的标签平滑)

paper | code

[1] DN-DETR: Accelerate DETR Training by Introducing Query DeNoising(通过引入查询去噪加速 DETR 训练)

keywords: Detection Transformer

paper | code

噪声标签(Noisy Label)

[3] UNICON: Combating Label Noise Through Uniform Selection and Contrastive Learning(通过统一选择和对比学习来对抗标签噪声)

paper | code

[2] Scalable Penalized Regression for Noise Detection in Learning with Noisy Labels(带有噪声标签的学习中噪声检测的可扩展惩罚回归)

paper | code

[1] Scalable Penalized Regression for Noise Detection in Learning with Noisy Labels(Scalable Penalized Regression for Noise Detection in Learning with Noisy Labels)

paper | code

长尾分布(Long-Tailed Distribution)

[1] Targeted Supervised Contrastive Learning for Long-Tailed Recognition(用于长尾识别的有针对性的监督对比学习)

keywords: Long-Tailed Recognition(长尾识别), Contrastive Learning(对比学习)

paper

图像特征提取与匹配(Image feature extraction and matching)

[1] Probabilistic Warp Consistency for Weakly-Supervised Semantic Correspondences(弱监督语义对应的概率扭曲一致性)

paper | code

视觉表征学习(Visual Representation Learning)

[4] Node Representation Learning in Graph via Node-to-Neighbourhood Mutual Information Maximization(通过节点到邻域互信息最大化的图中节点表示学习)

paper | code

[3] SimAN: Exploring Self-Supervised Representation Learning of Scene Text via Similarity-Aware Normalization(通过相似性感知归一化探索场景文本的自监督表示学习)

paper

[2] Exploring Set Similarity for Dense Self-supervised Representation Learning(探索密集自监督表示学习的集合相似性)

paper

[1] Motion-aware Contrastive Video Representation Learning via Foreground-background Merging(通过前景-背景合并的运动感知对比视频表示学习)

paper | code

模型评估(Model Evaluation)

[1] MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound(通过视觉、语言和声音的神经脚本知识)

paper | project

视听学习(Audio-visual Learning)

[4] Audio-visual Generalised Zero-shot Learning with Cross-modal Attention and Language(具有跨模态注意力和语言的视听广义零样本学习)

paper | code

[3] Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes(自监督预测学习：视觉场景中声源定位的无负法方法)(视觉定位)

paper | code

[2] Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation(用于协同语音手势生成的学习分层跨模式关联)

paper | project

[1] UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection(用于联合视频时刻检索和高光检测的统一多模态transformer)

paper | code

视觉-语言（Vision-language）

[14] DiffusionCLIP: Text-Guided Diffusion Models for Robust Image Manipulation(用于鲁棒图像处理的文本引导扩散模型)

paper | code

[13] StyleT2I: Toward Compositional and High-Fidelity Text-to-Image Synthesis(走向合成和高保真文本到图像的合成)

paper

[12] LiT: Zero-Shot Transfer with Locked-image text Tuning(带锁定图像文本调整的零样本迁移)

paper

[11] VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks(视觉和语言任务的参数高效迁移学习)

paper | code

[10] Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model(预测、预防和评估：由预训练的视觉语言模型支持的解耦的文本驱动图像处理)

paper | code

[9] LAFITE: Towards Language-Free Training for Text-to-Image Generation(面向文本到图像生成的无语言培训)

paper | code

[8] An Empirical Study of Training End-to-End Vision-and-Language Transformers(培训端到端视觉和语言transformer的实证研究)

paper | code

[7] Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding(为视觉基础生成伪语言查询)

paper | code

[6] Conditional Prompt Learning for Vision-Language Models(视觉语言模型的条件提示学习)

paper | code

[5] NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks(视觉和视觉语言任务中的自然语言解释模型)

paper | code

[4] L-Verse: Bidirectional Generation Between Image and Text(图像和文本之间的双向生成) (Oral Presentation)

paper

[3] HairCLIP: Design Your Hair by Text and Reference Image(通过文本和参考图像设计你的头发)

keywords: Language-Image Pre-Training (CLIP), Generative Adversarial Networks

paper | project

[1] Vision-Language Pre-Training with Triple Contrastive Learning(三重对比学习的视觉语言预训练)

keywords: Vision-language representation learning, Contrastive Learning
paper | code

视觉预测(Vision-based Prediction)

[12] Multi-Person Extreme Motion Prediction(多人极限运动预测)

paper | [code and dataset](Multi Person Extreme Motion Prediction – RobotLearn prediction/)

[11] Joint Hand Motion and Interaction Hotspots Prediction from Egocentric Videos(以自我为中心的视频的联合手部运动和交互热点预测)

paper | project

[10] Vehicle trajectory prediction works, but not everywhere(车辆轨迹预测有效，但并非无处不在)

paper | code

[9] Stochastic Trajectory Prediction via Motion Indeterminacy Diffusion(基于运动不确定性扩散的随机轨迹预测)

paper | code

[8] Non-Probability Sampling Network for Stochastic Human Trajectory Prediction(用于随机人体轨迹预测的非概率采样网络)

paper | code

[7] Remember Intentions: Retrospective-Memory-based Trajectory Prediction(记住意图：基于回顾性记忆的轨迹预测)

paper | code

[6] GaTector: A Unified Framework for Gaze Object Prediction(凝视对象预测的统一框架)

paper

[5] On Adversarial Robustness of Trajectory Prediction for Autonomous Vehicles(自动驾驶汽车轨迹预测的对抗鲁棒性)

paper | code

[4] Adaptive Trajectory Prediction via Transferable GNN(基于可迁移 GNN 的自适应轨迹预测)

paper

[3] Towards Robust and Adaptive Motion Forecasting: A Causal Representation Perspective(迈向稳健和自适应运动预测：因果表示视角)

paper | code

[2] How many Observations are Enough? Knowledge Distillation for Trajectory Forecasting(多少个观察就足够了？轨迹预测的知识蒸馏)

keywords: Knowledge Distillation, trajectory forecasting

paper

[1] Motron: Multimodal Probabilistic Human Motion Forecasting(多模式概率人体运动预测)

paper

数据集(Dataset)

[16] Multi-Person Extreme Motion Prediction(多人极限运动预测)(人体交互数据集)

paper | [code and dataset](Multi Person Extreme Motion Prediction – RobotLearn prediction/)

[15] ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer(用于 Sim2Real 传输的多感官对象数据集)

paper | project | dataset

[14] Rethinking Visual Geo-localization for Large-Scale Applications(重新思考大规模应用程序的视觉地理定位)

paper | Dataset, code and trained models

[13] Deep Image-based Illumination Harmonization(基于深度图像的照明协调)

paper | dataset

[12] OakInk: A Large-scale Knowledge Repository for Understanding Hand-Object Interaction(理解手物交互的大规模知识库)

paper | datasets&code

[11] Instance-wise Occlusion and Depth Orders in Natural Scenes(自然场景中的实例遮挡和深度顺序)

paper | code

[10] Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities(用于理解程序活动的大规模多视图视频数据集)

paper | project

[9] Rope3D: TheRoadside Perception Dataset for Autonomous Driving and Monocular 3D Object Detection Task(用于自动驾驶和单目 3D 目标检测任务的路边感知数据集)

paper | dataset

[8] DynamicEarthNet: Daily Multi-Spectral Satellite Dataset for Semantic Change Segmentation(用于语义变化分割的每日多光谱卫星数据集)

paper | data | website

[7] Egocentric Prediction of Action Target in 3D(以自我为中心的 3D 行动目标预测)(机器人)

paper | project

[6] M5Product: Self-harmonized Contrastive Learning for E-commercial Multi-modal Pretraining(电子商务多模态预训练的自协调对比学习)(多模态预训练数据集)

paper

[5] FERV39k: A Large-Scale Multi-Scene Dataset for Facial Expression Recognition in Videos(用于视频中面部表情识别的大规模多场景数据集)

paper

[4] Ego4D: Around the World in 3,000 Hours of Egocentric Video(3000 小时以自我为中心的视频环游世界)

paper | project

[3] GrainSpace: A Large-scale Dataset for Fine-grained and Domain-adaptive Recognition of Cereal Grains(用于细粒度和域自适应识别谷物的大规模数据集)

paper | dataset

[2] Kubric: A scalable dataset generator(Kubric：可扩展的数据集生成器)

paper | code

[1] A Large-scale Comprehensive Dataset and Copy-overlap Aware Evaluation Protocol for Segment-level Video Copy Detection(用于分段级视频复制检测的大规模综合数据集和复制重叠感知评估协议)

VCSL (Video Copy Segment Localization) dataset

paper | dataset, metric and benchmark codes

主动学习(Active Learning)

[1] Active Learning by Feature Mixing(通过特征混合进行主动学习)

paper | code

小样本学习/零样本学习(Few-shot Learning/Zero-shot Learning)

[4] Integrative Few-Shot Learning for Classification and Segmentation(用于分类和分割的集成小样本学习)

paper

[3] Ranking Distance Calibration for Cross-Domain Few-Shot Learning(跨域小样本学习的排名距离校准)

paper

[2] Learning to Affiliate: Mutual Centralized Learning for Few-shot Classification(小样本分类的相互集中学习)

paper

[1] MSDN: Mutually Semantic Distillation Network for Zero-Shot Learning(用于零样本学习的相互语义蒸馏网络)

keywords: Zero-Shot Learning, Knowledge Distillation

paper | code

持续学习(Continual Learning/Life-long Learning)

[5] GCR: Gradient Coreset Based Replay Buffer Selection For Continual Learning(用于持续学习的基于梯度核心集的重放缓冲区选择)

paper

[4] Probing Representation Forgetting in Supervised and Unsupervised Continual Learning(探索有监督和无监督持续学习中的表征遗忘)

paper

[3] Meta-attention for ViT-backed Continual Learning(ViT 支持的持续学习的元注意力)

paper | code

[2] Learning to Prompt for Continual Learning(学习提示持续学习)

paper | code

[1] On Generalizing Beyond Domains in Cross-Domain Continual Learning(关于跨域持续学习中的域外泛化)

paper

场景图(Scene Graph)

[1] Continuous Scene Representations for Embodied AI(具身 AI 的连续场景表示)

paper | project | code | video

场景图生成(Scene Graph Generation)

[2] Stacked Hybrid-Attention and Group Collaborative Learning for Unbiased Scene Graph Generation(用于无偏场景图生成的堆叠混合注意力和组协作学习)

paper | code

[1] Classification-Then-Grounding: Reformulating Video Scene Graphs as Temporal Bipartite Graphs(将视频场景图重新格式化为时间二分图)

keywords: Video Scene Graph Generation, Transformer, Video Grounding

paper | code

场景图预测(Scene Graph Prediction)

场景图理解(Scene Graph Understanding)

视觉定位/位姿估计(Visual Localization/Pose Estimation)

[16] ES6D: A Computation Efficient and Symmetry-Aware 6D Pose Regression Framework(一种计算效率高且具有对称性的 6D 姿势回归框架)

paper | code

[15] Templates for 3D Object Pose Estimation Revisited: Generalization to New Objects and Robustness to Occlusions(重新审视 3D 对象姿态估计的模板：对新对象的泛化和对遮挡的鲁棒性)

paper | code

[14] OSOP: A Multi-Stage One Shot Object Pose Estimation Framework(多阶段 One Shot 对象姿态估计框架)

paper

[13] Putting People in their Place: Monocular Regression of 3D People in Depth(3D 人物深度的单目回归)

paper | code | Dataset

[12] FS6D: Few-Shot 6D Pose Estimation of Novel Objects(新物体的小样本 6D 姿态估计)

paper | project

[11] Uni6D: A Unified CNN Framework without Projection Breakdown for 6D Pose Estimation(用于 6D 姿势估计的无投影分解的统一 CNN 框架)

paper

[10] EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation(用于单目物体姿态估计的广义端到端概率透视-n-点)

paper

[9] RNNPose: Recurrent 6-DoF Object Pose Refinement with Robust Correspondence Field Estimation and Pose Optimization(具有鲁棒对应场估计和姿态优化的递归 6-DoF 对象姿态细化)

paper | code

[8] DiffPoseNet: Direct Differentiable Camera Pose Estimation(直接可微分相机位姿估计)

paper

[7] ZebraPose: Coarse to Fine Surface Encoding for 6DoF Object Pose Estimation(用于 6DoF 对象姿态估计的粗到细表面编码)

paper

[6] Object Localization under Single Coarse Point Supervision(单粗点监督下的目标定位)

paper | code

[5] CrossLoc: Scalable Aerial Localization Assisted by Multimodal Synthetic Data(多模式合成数据辅助的可扩展空中定位)

paper | code

[4] GPV-Pose: Category-level Object Pose Estimation via Geometry-guided Point-wise Voting(通过几何引导的逐点投票进行类别级对象位姿估计)

paper | code

[3] CPPF: Towards Robust Category-Level 9D Pose Estimation in the Wild(CPPF：在野外实现稳健的类别级 9D 位姿估计)

paper | code

[2] OVE6D: Object Viewpoint Encoding for Depth-based 6D Object Pose Estimation(用于基于深度的 6D 对象位姿估计的对象视点编码)

paper | code

[1] Spatial Commonsense Graph for Object Localisation in Partial Scenes(局部场景中对象定位的空间常识图)

paper | code | project

视觉推理/视觉问答(Visual Reasoning/VQA)

[5] SimVQA: Exploring Simulated Environments for Visual Question Answering(探索视觉问答的模拟环境)

paper | project

[4] Learning to Answer Questions in Dynamic Audio-Visual Scenarios(学习在动态视听场景中回答问题)(视听学习)

paper | code

[3] Visual Abductive Reasoning(视觉溯因推理)

paper | code

[2] MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-based Visual Question Answering(基于知识的视觉问答的多模态知识提取与积累)

paper | code

[1] REX: Reasoning-aware and Grounded Explanation(推理意识和扎根的解释)

paper | code

图像分类(Image Classification)

[2] CAD: Co-Adapting Discriminative Features for Improved Few-Shot Classification(共同适应判别特征以改进小样本分类)

paper

[1] GlideNet: Global, Local and Intrinsic based Dense Embedding NETwork for Multi-category Attributes Prediction(用于多类别属性预测的基于全局、局部和内在的密集嵌入网络)

keywords: multi-label classification

paper | code | project

迁移学习/domain/自适应(Transfer Learning/Domain Adaptation)

[10] Transferability Estimation using Bhattacharyya Class Separability(使用 Bhattacharyya 类可分离性的可迁移性估计)

paper

[9] The Norm Must Go On: Dynamic Unsupervised Domain Adaptation by Normalization(通过归一化进行动态无监督域自适应)

paper | code

[8] Continual Test-Time Domain Adaptation(持续测试时域适应)

paper | code

[7] Compound Domain Generalization via Meta-Knowledge Encoding(基于元知识编码的复合域泛化)

paper

[6] Learning Affordance Grounding from Exocentric Images(从离中心图像中学习可供性基础)

paper | code

[5] Category Contrast for Unsupervised Domain Adaptation in Visual Tasks(视觉任务中无监督域适应的类别对比)

paper

[4] Learning Distinctive Margin toward Active Domain Adaptation(向主动领域适应学习独特的边际)

paper | code

[3] How Well Do Sparse Imagenet Models Transfer?(稀疏 Imagenet 模型的迁移效果如何？)

paper

[2] A Simple Multi-Modality Transfer Learning Baseline for Sign Language Translation(用于手语翻译的简单多模态迁移学习基线)

paper

[1] Weakly Supervised Object Localization as Domain Adaption(作为域适应的弱监督对象定位)

keywords: Weakly Supervised Object Localization(WSOL), Multi-instance learning based WSOL, Separated-structure based WSOL, Domain Adaption

paper | code

度量学习(Metric Learning)

[4] Hyperbolic Vision Transformers: Combining Improvements in Metric Learning(双曲线视觉transformer：结合度量学习的改进)

paper | code

[3] Non-isotropy Regularization for Proxy-based Deep Metric Learning(基于代理的深度度量学习的非各向同性正则化)

paper | code

[2] Integrating Language Guidance into Vision-based Deep Metric Learning(将语言指导集成到基于视觉的深度度量学习中)

paper | code

[1] Enhancing Adversarial Robustness for Deep Metric Learning(增强深度度量学习的对抗鲁棒性)

keywords: Adversarial Attack, Adversarial Defense, Deep Metric Learning

paper

对比学习(Contrastive Learning)

[6] Versatile Multi-Modal Pre-Training for Human-Centric Perception(用于以人为中心的感知的多功能多模态预训练)

paper | project | code

[5] Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation(用于弱监督对象定位和语义分割的类不可知激活图的对比学习)

paper | [code](https://github.com/CVI- SZU/CCAM)

[4] Rethinking Minimal Sufficient Representation in Contrastive Learning(重新思考对比学习中的最小充分表示)

paper | code

[3] Selective-Supervised Contrastive Learning with Noisy Labels(带有噪声标签的选择性监督对比学习)

paper | code

[2] HCSC: Hierarchical Contrastive Selective Coding(分层对比选择性编码)

keywords: Self-supervised Representation Learning, Deep Clustering, Contrastive Learning

paper | code

[1] Crafting Better Contrastive Views for Siamese Representation Learning(为连体表示学习制作更好的对比视图)

paper | code

增量学习(Incremental Learning)

[3] Mimicking the Oracle: An Initial Phase Decorrelation Approach for Class Incremental Learning(类增量学习的初始阶段去相关方法)

paper | code

[2] Forward Compatible Few-Shot Class-Incremental Learning(前后兼容的小样本类增量学习)

paper | code

[1] Self-Sustaining Representation Expansion for Non-Exemplar Class-Incremental Learning(非示例类增量学习的自我维持表示扩展)

paper

强化学习(Reinforcement Learning)

[1] Bailando: 3D Dance Generation by Actor-Critic GPT with Choreographic Memory(具有编排记忆的演员评论家 GPT 的 3D 舞蹈生成)

paper | code

元学习(Meta Learning)

[3] A Structured Dictionary Perspective on Implicit Neural Representations(隐式神经表示的结构化字典视角)

paper | code

[2] Multidimensional Belief Quantification for Label-Efficient Meta-Learning(标签高效元学习的多维信念量化)

paper

[1] What Matters For Meta-Learning Vision Regression Tasks?(元学习视觉回归任务的重要性是什么？)

paper

机器人(Robotic)

[2] Coarse-to-Fine Q-attention: Efficient Learning for Visual Robotic Manipulation via Discretisation(通过离散化实现视觉机器人操作的高效学习)

paper | code | project

[1] IFOR: Iterative Flow Minimization for Robotic Object Rearrangement(IFOR：机器人对象重排的迭代流最小化)

paper | project

半监督学习/弱监督学习/无监督学习/自监督学习(Self-supervised Learning/Semi-supervised Learning)

[8] When Does Contrastive Visual Representation Learning Work?(对比视觉表征学习何时起作用)

paper

[7] Leverage Your Local and Global Representations: A New Self-Supervised Learning Strategy(利用局部和全局表征：一种新的自我监督学习策略)

paper

[6] Decoupling Makes Weakly Supervised Local Feature Better(解耦使弱监督的局部特征更好)

paper | code

[5] SimMatch: Semi-supervised Learning with Similarity Matching(具有相似性匹配的半监督学习)

paper | code

[4] Robust Equivariant Imaging: a fully unsupervised framework for learning to image from noisy and partial measurements(一个完全无监督的框架，用于从噪声和部分测量中学习图像)

paper | code

[3] UniVIP: A Unified Framework for Self-Supervised Visual Pre-training(自监督视觉预训练的统一框架)

paper

[2] Class-Aware Contrastive Semi-Supervised Learning(类感知对比半监督学习)

keywords: Semi-Supervised Learning, Self-Supervised Learning, Real-World Unlabeled Data Learning

paper

[1] A study on the distribution of social biases in self-supervised learning visual models(自监督学习视觉模型中social biases分布的研究)

paper

神经网络可解释性(Neural Network Interpretability)

[2] Do Explanations Explain? Model Knows Best(解释解释吗？模型最清楚)

paper

[1] Interpretable part-whole hierarchies and conceptual-semantic relationships in neural networks(神经网络中可解释的部分-整体层次结构和概念语义关系)

paper

图像计数(Image Counting)

[3] DR.VIC: Decomposition and Reasoning for Video Individual Counting(视频个体计数的分解与推理)

paper | code

[2] Represent, Compare, and Learn: A Similarity-Aware Framework for Class-Agnostic Counting(表示、比较和学习：用于类不可知计数的相似性感知框架)

paper | code

[1] Boosting Crowd Counting via Multifaceted Attention(通过多方面注意提高人群计数)

paper | code

联邦学习(Federated Learning)

[5] FedCor: Correlation-Based Active Client Selection Strategy for Heterogeneous Federated Learning(用于异构联邦学习的基于相关性的主动客户端选择策略)

paper

[4] FedDC: Federated Learning with Non-IID Data via Local Drift Decoupling and Correction(通过局部漂移解耦和校正与非 IID 数据进行联邦学习)

paper | code

[3] Federated Class-Incremental Learning(联邦类增量学习)

paper | code

[2] Fine-tuning Global Model via Data-Free Knowledge Distillation for Non-IID Federated Learning(通过非 IID 联邦学习的无数据知识蒸馏微调全局模型)

paper

[1] Differentially Private Federated Learning with Local Regularization and Sparsification(局部正则化和稀疏化的差分私有联邦学习)

paper

其他

Less is More: Generating Grounded Navigation Instructions from Landmarks(从地标生成接地导航指令)(视觉导航)

paper

Fast, Accurate and Memory-Efficient Partial Permutation Synchronization(快速、准确和内存高效的部分置换同步)

paper

Interactive Disentanglement: Learning Concepts by Interacting with their Prototype Representations(通过与原型表示交互来学习概念)

paper | code

Clean Implicit 3D Structure from Noisy 2D STEM Images(从嘈杂的 2D STEM 图像中清除隐式 3D 结构)

paper

ZZ-Net: A Universal Rotation Equivariant Architecture for 2D Point Clouds(二维点云的通用旋转等变架构)

paper

MDAN: Multi-level Dependent Attention Network for Visual Emotion Analysis(用于视觉情感分析的多级依赖注意网络)

paper

Moving Window Regression: A Novel Approach to Ordinal Regression(序数回归的一种新方法)

paper | code

Hierarchical Nearest Neighbor Graph Embedding for Efficient Dimensionality Reduction(用于有效降维的分层最近邻图嵌入)

paper | code

TransVPR: Transformer-based place recognition with multi-level attention aggregation(具有多级注意力聚合的基于 Transformer 的位置识别)(图像匹配)

paper

Ev-TTA: Test-Time Adaptation for Event-Based Object Recognition(基于事件的对象识别的测试时间适应)

paper

Learning from All Vehicles(向所有车辆学习)(自动驾驶)

paper | code | demo

Mixed Differential Privacy in Computer Vision(计算机视觉中的混合差分隐私)

paper

Robust and Accurate Superquadric Recovery: a Probabilistic Approach(稳健且准确的超二次曲线恢复：一种概率方法)

paper | code

AirObject: A Temporally Evolving Graph Embedding for Object Identification(用于对象识别的时间演化图嵌入)(object encoding)

paper | code

FastDOG: Fast Discrete Optimization on GPU(GPU 上的快速离散优化)

paper | code

Neural Collaborative Graph Machines for Table Structure Recognition(用于表结构识别的神经协同图机)

paper

Contrastive Conditional Neural Processes(对比条件神经过程)

paper

Deep Rectangling for Image Stitching: A Learning Baseline(图像拼接的深度矩形：学习基线)(Image Stitching)

paper | code

Online Learning of Reusable Abstract Models for Object Goal Navigation(对象目标导航可重用抽象模型的在线学习)

paper

PINA: Learning a Personalized Implicit Neural Avatar from a Single RGB-D Video Sequence(PINA：从单个 RGB-D 视频序列中学习个性化的隐式神经化身)

paper | video | project

2. CVPR2022 Oral

[13] I M Avatar: Implicit Morphable Head Avatars from Videos(视频中的隐式可变形头部头像)(Oral)

paper | project

[12] Parameter-free Online Test-time Adaptation(无参数在线测试时间自适应)(Oral)

paper | code

[11] Correlation Verification for Image Retrieval(图像检索的相关性验证)(Oral)

paper | code

[10] Rethinking Semantic Segmentation: A Prototype View(重新思考语义分割：原型视图)(Oral)

paper | code

[9] SNUG: Self-Supervised Neural Dynamic Garments(自我监督的神经动态服装)(Oral)

paper | project

[8] SelfRecon: Self Reconstruction Your Digital Avatar from Monocular Video(从单目视频自我重建你的数字化身)(Oral)

paper | code

[7] Dual-AI: Dual-path Action Interaction Learning for Group Activity Recognition(用于群体动作识别的双路径动作交互学习)(Oral)

paper | project

[6] 3D Common Corruptions and Data Augmentation(3D 常见损坏和数据增强)(Oral)

paper | project

[5] GAN-Supervised Dense Visual Alignment(GAN监督的密集视觉对齐)(Oral)

paper | code | project

[4] It's All In the Teacher: Zero-Shot Quantization Brought Closer to the Teacher(一切尽在老师身上：零样本量化更贴近老师)(Oral)

paper

[3] AdaMixer: A Fast-Converging Query-Based Object Detector(一种快速收敛的基于查询的对象检测器)(Oral)

paper | code

[2] Multi-View Depth Estimation by Fusing Single-View Depth Probability with Multi-View Geometry(通过融合单视图深度概率与多视图几何进行多视图深度估计)(Oral)

paper | code

[1] L-Verse: Bidirectional Generation Between Image and Text(图像和文本之间的双向生成) (视觉语言表征学习)

paper

3. CVPR2022 论文解读汇总

【22】MLP才是无监督学习比监督学习迁移性能好的关键因素

【21】精准高效估计多人3D姿态，美图&北航联合提出分布感知式单阶段模型

【20】利用域自适应思想，北大、字节跳动提出新型弱监督物体定位框架

【19】只用一张图+相机走位，AI就能脑补周围环境

【18】Point-BERT: 基于掩码建模的点云自注意力模型预训练

【17】Swin Transformer迎来30亿参数的v2.0，我们应该拥抱视觉大模型吗？

【16】Adobe把GAN搞成了缝合怪，凭空P出一张1024分辨率全身人像

【15】中国科大等提出点云连续隐式表示 Neural Points：上采样任务效果惊艳

【14】马普所开源 ICON：显著提高单张图像重建三维数字人的姿势水平

【13】图像也是德布罗意波！华为诺亚&北大提出量子启发 MLP，性能超越 Swin Transfomer

【12】群核前沿院等提出首个基于数据驱动的面检测算法

【11】MPViT：用于密集预测的多路径视觉Transformer

【10】ST++: 半监督语义分割中更优的自训练范式

【9】CNN自监督预训练新SOTA！上交等联合提出HCSC：具有层级结构的图像表征自学习新框架

【8】Restormer: 刷新多个low-level任务指标

【7】百变发型！中科大等提出HairCLIP：基于文本和参考图像的头发编辑方法

【6】凭什么 31x31 大小卷积核的耗时可以和 9x9 卷积差不多？
RepLKNet: 大核卷积+结构重参数让CNN再次伟大

【5】U2PL: 使用不可靠伪标签的半监督语义分割

【4】针对目标检测的重点与全局知识蒸馏(FGD)

【3】即插即用！助力自监督涨点的ContrastiveCrop开源了！

【2】从原理和代码详解FAIR的惊艳之作：全新的纯卷积模型ConvNeXt
“文艺复兴” ConvNet卷土重来，压过Transformer！FAIR重新设计纯卷积新架构

【1】南开程明明团队和天大提出LD：目标检测的定位蒸馏

4. CVPR2022论文分享

5. To do list

声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/w/你好赵伟/article/detail/296169