CVPR 2021 论文和开源项目合集(papers with code)!
CVPR 2021 收录列表:http://cvpr2021.thecvf.com/sites/default/files/2021-03/accepted_paper_ids.txt
注1:欢迎各位大佬提交issue,分享CVPR 2021论文和开源项目!
注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision
Decoupled Dynamic Filter Networks
Lite-HRNet: A Lightweight High-Resolution Network
CondenseNet V2: Sparse Feature Reactivation for Deep Networks
Paper: https://arxiv.org/abs/2104.04382
Code: https://github.com/jianghaojun/CondenseNetV2
Diverse Branch Block: Building a Convolution as an Inception-like Unit
Paper: https://arxiv.org/abs/2103.13425
Code: https://github.com/DingXiaoH/DiverseBranchBlock
Scaling Local Self-Attention For Parameter Efficient Visual Backbones
Paper(Oral): https://arxiv.org/abs/2103.12731
Code: None
ReXNet: Diminishing Representational Bottleneck on Convolutional Neural Network
Involution: Inverting the Inherence of Convolution for Visual Recognition
Coordinate Attention for Efficient Mobile Network Design
Inception Convolution with Efficient Dilation Search
RepVGG: Making VGG-style ConvNets Great Again
Combined Depth Space based Architecture Search For Person Re-identification
DiNTS: Differentiable Neural Network Topology Search for 3D Medical Image Segmentation
HR-NAS: Searching Efficient High-Resolution Neural Architectures with Transformers
Neural Architecture Search with Random Labels
Towards Improving the Consistency, Efficiency, and Flexibility of Differentiable Neural Architecture Search
Joint-DetNAS: Upgrade Your Detector with NAS, Pruning and Dynamic Distillation
Prioritized Architecture Sampling with Monto-Carlo Tree Search
Contrastive Neural Architecture Search with Neural Architecture Comparators
AttentiveNAS: Improving Neural Architecture Search via Attentive
ReNAS: Relativistic Evaluation of Neural Architecture Search
HourNAS: Extremely Fast Neural Architecture
Searching by Generating: Flexible and Efficient One-Shot NAS with Architecture Generator
OPANAS: One-Shot Path Aggregation Network Architecture Search for Object Detection
Inception Convolution with Efficient Dilation Search
High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network
DG-Font: Deformable Generative Networks for Unsupervised Font Generation
Paper: https://arxiv.org/abs/2104.03064
Code: https://github.com/ecnuycxie/DG-Font
PD-GAN: Probabilistic Diverse GAN for Image Inpainting
StyleMapGAN: Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing
Drafting and Revision: Laplacian Pyramid Network for Fast High-Quality Artistic Style Transfer
Regularizing Generative Adversarial Networks under Limited Data
Towards Real-World Blind Face Restoration with Generative Facial Prior
TediGAN: Text-Guided Diverse Image Generation and Manipulation
Homepage: https://xiaweihao.com/projects/tedigan/
Paper: https://arxiv.org/abs/2012.03308
Code: https://github.com/weihaox/TediGAN
Generative Hierarchical Features from Synthesizing Image
Homepage: https://genforce.github.io/ghfeat/
Paper(Oral): https://arxiv.org/abs/2007.10379
Code: https://github.com/genforce/ghfeat
Teachers Do More Than Teach: Compressing Image-to-Image Models
HistoGAN: Controlling Colors of GAN-Generated and Real Images via Color Histograms
pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis
Homepage: https://marcoamonteiro.github.io/pi-GAN-website/
Paper(Oral): https://arxiv.org/abs/2012.00926
Code: None
DivCo: Diverse Conditional Image Synthesis via Contrastive Generative Adversarial Network
Diverse Semantic Image Synthesis via Probability Distribution Modeling
LOHO: Latent Optimization of Hairstyles via Orthogonalization
PISE: Person Image Synthesis and Editing with Decoupled GAN
DeFLOCNet: Deep Image Editing via Flexible Low-level Controls
PD-GAN: Probabilistic Diverse GAN for Image Inpainting
Efficient Conditional GAN Transfer with Knowledge Propagation across Classes
Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing
Hijack-GAN: Unintended-Use of Pretrained, Black-Box GANs
Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation
A 3D GAN for Improved Large-pose Facial Recognition
HumanGAN: A Generative Model of Humans Images
ID-Unet: Iterative Soft and Hard Deformation for View Synthesis
CoMoGAN: continuous model-guided image-to-image translation
Training Generative Adversarial Networks in One Stage
Closed-Form Factorization of Latent Semantics in GANs
Anycost GANs for Interactive Image Synthesis and Editing
Image-to-image Translation via Hierarchical Style Disentanglement
Soft-IntroVAE: Analyzing and Improving Introspective Variational Autoencoders
Homepage: https://taldatech.github.io/soft-intro-vae-web/
Paper: https://arxiv.org/abs/2012.13253
Code: https://github.com/taldatech/soft-intro-vae-pytorch
1. End-to-End Human Pose and Mesh Reconstruction with Transformers
2. Temporal-Relational CrossTransformers for Few-Shot Action Recognition
3. Kaleido-BERT:Vision-Language Pre-training on Fashion Domain
4. HOTR: End-to-End Human-Object Interaction Detection with Transformers
5. Multi-Modal Fusion Transformer for End-to-End Autonomous Driving
6. Pose Recognition with Cascade Transformers
Paper: https://arxiv.org/abs/2104.06976
Code: https://github.com/mlpc-ucsd/PRTR
7. Variational Transformer Networks for Layout Generation
8. LoFTR: Detector-Free Local Feature Matching with Transformers
9. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers
10. Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers
11. Transformer Tracking
12. HR-NAS: Searching Efficient High-Resolution Neural Architectures with Transformers
13. MIST: Multiple Instance Spatial Transformer
14. Multimodal Motion Prediction with Stacked Transformers
15. Revamping cross-modal recipe retrieval with hierarchical Transformers and self-supervised learning
Paper: https://www.amazon.science/publications/revamping-cross-modal-recipe-retrieval-with-hierarchical-transformers-and-self-supervised-learning
Code: https://github.com/amzn/image-to-recipe-transformers
16. Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking
Paper(Oral): https://arxiv.org/abs/2103.11681
Code: https://github.com/594422814/TransformerTrack
17. Pre-Trained Image Processing Transformer
18. End-to-End Video Instance Segmentation with Transformers
19. UP-DETR: Unsupervised Pre-training for Object Detection with Transformers
20. End-to-End Human Object Interaction Detection with HOI Transformer
21. Transformer Interpretability Beyond Attention Visualization
22. Diverse Part Discovery: Occluded Person Re-Identification With Part-Aware Transformer
23. LayoutTransformer: Scene Layout Generation With Conceptual and Spatial Diversity
24. Line Segment Detection Using Transformers without Edges
25. MaX-DeepLab: End-to-End Panoptic Segmentation With Mask Transformers
26. SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation
27. Facial Action Unit Detection With Transformers
28. Clusformer: A Transformer Based Clustering Approach to Unsupervised Large-Scale Face and Visual Landmark Recognition
29. Lesion-Aware Transformers for Diabetic Retinopathy Grading
30. Topological Planning With Transformers for Vision-and-Language Navigation
31. Adaptive Image Transformer for One-Shot Object Detection
32. Multi-Stage Aggregated Transformer Network for Temporal Language Localization in Videos
33. Taming Transformers for High-Resolution Image Synthesis
34. Self-Supervised Video Hashing via Bidirectional Transformers
35. Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos
36. Gaussian Context Transformer
37. General Multi-Label Image Classification With Transformers
38. Bottleneck Transformers for Visual Recognition
39. VLN BERT: A Recurrent Vision-and-Language BERT for Navigation
40. Less Is More: ClipBERT for Video-and-Language Learning via Sparse Sampling
41. Self-attention based Text Knowledge Mining for Text Detection
42. SSAN: Separable Self-Attention Network for Video Representation Learning
43. Scaling Local Self-Attention For Parameter Efficient Visual Backbones
Paper(Oral): https://arxiv.org/abs/2103.12731
Code: None
Regularizing Neural Networks via Adversarial Model Perturbation
Differentiable SLAM-net: Learning Particle SLAM for Visual Navigation
Generalizing to the Open World: Deep Visual Odometry with Online Adaptation
Adversarial Robustness under Long-Tailed Distribution
Distribution Alignment: A Unified Framework for Long-tail Visual Recognition
Adaptive Class Suppression Loss for Long-Tail Object Detection
Contrastive Learning based Hybrid Networks for Long-Tailed Image Classification
Scale-aware Automatic Augmentation for Object Detection
Paper: https://arxiv.org/abs/2103.17220
Code: https://github.com/Jia-Research-Lab/SA-AutoAug
Domain-Specific Suppression for Adaptive Object Detection
A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning
Paper: https://arxiv.org/abs/2104.14558
Code: https://github.com/facebookresearch/SlowFast
Unsupervised Multi-Source Domain Adaptation for Person Re-Identification
Self-supervised Video Representation Learning by Context and Motion Decoupling
Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning
Spatially Consistent Representation Learning
VideoMoCo: Contrastive Video Representation Learning with Temporally Adversarial Examples
Exploring Simple Siamese Representation Learning
Dense Contrastive Learning for Self-Supervised Visual Pre-Training
Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework
Adaptive Consistency Regularization for Semi-Supervised Transfer Learning
Capsule Network is Not More Robust than Convolutional Network
Domain-Specific Suppression for Adaptive Object Detection
IQDet: Instance-wise Quality Distribution Sampling for Object Detection
Multi-Scale Aligned Distillation for Low-Resolution Detection
Paper: https://jiaya.me/papers/ms_align_distill_cvpr21.pdf
Code: https://github.com/Jia-Research-Lab/MSAD
Adaptive Class Suppression Loss for Long-Tail Object Detection
VarifocalNet: An IoU-aware Dense Object Detector
Paper(Oral): https://arxiv.org/abs/2008.13367
Code: https://github.com/hyz-xmaster/VarifocalNet
Scale-aware Automatic Augmentation for Object Detection
Paper: https://arxiv.org/abs/2103.17220
Code: https://github.com/Jia-Research-Lab/SA-AutoAug
OTA: Optimal Transport Assignment for Object Detection
Distilling Object Detectors via Decoupled Features
Sparse R-CNN: End-to-End Object Detection with Learnable Proposals
There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge
Positive-Unlabeled Data Purification in the Wild for Object Detection
Instance Localization for Self-supervised Detection Pretraining
MeGA-CDA: Memory Guided Attention for Category-Aware Unsupervised Domain Adaptive Object Detection
End-to-End Object Detection with Fully Convolutional Network
Robust and Accurate Object Detection via Adversarial Learning
Paper: https://arxiv.org/abs/2103.13886
Code: None
I^3Net: Implicit Instance-Invariant Network for Adapting One-Stage Object Detectors
Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework
OPANAS: One-Shot Path Aggregation Network Architecture Search for Object Detection
YOLOF:You Only Look One-level Feature
UP-DETR: Unsupervised Pre-training for Object Detection with Transformers
General Instance Distillation for Object Detection
There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge
Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection
Multiple Instance Active Learning for Object Detection
Towards Open World Object Detection
Adaptive Image Transformer for One-Shot Object Detection
Dense Relation Distillation with Context-aware Aggregation for Few-Shot Object Detection
Semantic Relation Reasoning for Shot-Stable Few-Shot Object Detection
Few-Shot Object Detection via Contrastive Proposal Encoding
ReDet: A Rotation-equivariant Detector for Aerial Object Detection
Paper: https://arxiv.org/abs/2103.07733
Code: https://github.com/csuhan/ReDet
LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search
Paper: https://arxiv.org/abs/2104.14545
Code: https://github.com/researchmm/LightTrack
Towards More Flexible and Accurate Object Tracking with Natural Language: Algorithms and Benchmark
Homepage: https://sites.google.com/view/langtrackbenchmark/
Paper: https://arxiv.org/abs/2103.16746
Evaluation Toolkit: https://github.com/wangxiao5791509/TNL2K_evaluation_toolkit
Demo Video: https://www.youtube.com/watch?v=7lvVDlkkff0&ab_channel=XiaoWang
IoU Attack: Towards Temporally Coherent Black-Box Adversarial Attack for Visual Object Tracking
Graph Attention Tracking
Rotation Equivariant Siamese Networks for Tracking
Track to Detect and Segment: An Online Multi-Object Tracker
Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking
Paper(Oral): https://arxiv.org/abs/2103.11681
Code: https://github.com/594422814/TransformerTrack
Transformer Tracking
Multiple Object Tracking with Correlation Learning
Probabilistic Tracklet Scoring and Inpainting for Multiple Object Tracking
Learning a Proposal Classifier for Multiple Object Tracking
Track to Detect and Segment: An Online Multi-Object Tracker
ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation
Rethinking BiSeNet For Real-time Semantic Segmentation
Paper: https://arxiv.org/abs/2104.13188
Code: https://github.com/MichaelFan01/STDC-Seg
Progressive Semantic Segmentation
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers
Bidirectional Projection Network for Cross Dimension Scene Understanding
Cross-Dataset Collaborative Learning for Semantic Segmentation
Continual Semantic Segmentation via Repulsion-Attraction of Sparse and Disentangled Latent Representations
Capturing Omni-Range Context for Omnidirectional Segmentation
Learning Statistical Texture for Semantic Segmentation
PLOP: Learning without Forgetting for Continual Semantic Segmentation
Background-Aware Pooling and Noise-Aware Loss for Weakly-Supervised Semantic Segmentation
Homepage: https://cvlab.yonsei.ac.kr/projects/BANA/
Paper: https://arxiv.org/abs/2104.00905
Code: None
Non-Salient Region Object Mining for Weakly Supervised Semantic Segmentation
BBAM: Bounding Box Attribution Map for Weakly Supervised Semantic and Instance Segmentation
Semi-supervised Domain Adaptation based on Dual-level Domain Mixing for Semantic Segmentation
Self-supervised Augmentation Consistency for Adapting Semantic Segmentation
Paper: https://arxiv.org/abs/2105.00097
Code: https://github.com/visinf/da-sac
RobustNet: Improving Domain Generalization in Urban-Scene Segmentation via Instance Selective Whitening
Coarse-to-Fine Domain Adaptive Semantic Segmentation with Photometric Alignment and Category-Center Regularization
MetaCorrection: Domain-aware Meta Loss Correction for Unsupervised Domain Adaptation in Semantic Segmentation
Multi-Source Domain Adaptation with Collaborative Learning for Semantic Segmentation
Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation
VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild
DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation
Incremental Few-Shot Instance Segmentation
A^2-FPN: Attention Aggregation based Feature Pyramid Network for Instance Segmentation
RefineMask: Towards High-Quality Instance Segmentation with Fine-Grained Features
Look Closer to Segment Better: Boundary Patch Refinement for Instance Segmentation
Multi-Scale Aligned Distillation for Low-Resolution Detection
Paper: https://jiaya.me/papers/ms_align_distill_cvpr21.pdf
Code: https://github.com/Jia-Research-Lab/MSAD
Boundary IoU: Improving Object-Centric Image Segmentation Evaluation
Homepage: https://bowenc0221.github.io/boundary-iou/
Paper: https://arxiv.org/abs/2103.16562
Code: https://github.com/bowenc0221/boundary-iou-api
Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers
Paper: https://arxiv.org/abs/2103.12340
Code: https://github.com/lkeab/BCNet
Zero-shot instance segmentation(Not Sure)
STMask: Spatial Feature Calibration and Temporal Fusion for Effective One-stage Video Instance Segmentation
End-to-End Video Instance Segmentation with Transformers
Exemplar-Based Open-Set Panoptic Segmentation Network
MaX-DeepLab: End-to-End Panoptic Segmentation With Mask Transformers
Panoptic Segmentation Forecasting
Fully Convolutional Networks for Panoptic Segmentation
Paper: https://arxiv.org/abs/2012.00720
Code: https://github.com/yanwei-li/PanopticFCN
Cross-View Regularization for Domain Adaptive Panoptic Segmentation
FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space
DiNTS: Differentiable Neural Network Topology Search for 3D Medical Image Segmentation
Learning Position and Target Consistency for Memory-based Video Object Segmentation
SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation
Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion
Homepage: https://hkchengrex.github.io/MiVOS/
Paper: https://arxiv.org/abs/2103.07941
Code: https://github.com/hkchengrex/MiVOS
Demo: https://hkchengrex.github.io/MiVOS/video.html#partb
Learning to Recommend Frame for Interactive Video Object Segmentation in the Wild
Paper: https://arxiv.org/abs/2103.10391
Code: https://github.com/svip-lab/IVOS-W
Uncertainty-aware Joint Salient Object and Camouflaged Object Detection
Paper: https://arxiv.org/abs/2104.02628
Code: https://github.com/JingZhang617/Joint_COD_SOD
Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion
Uncertainty-aware Joint Salient Object and Camouflaged Object Detection
Paper: https://arxiv.org/abs/2104.02628
Code: https://github.com/JingZhang617/Joint_COD_SOD
Group Collaborative Learning for Co-Salient Object Detection
Semantic Image Matting
Unsupervised Multi-Source Domain Adaptation for Person Re-Identification
Combined Depth Space based Architecture Search For Person Re-identification
Anchor-Free Person Search
Temporal-Relational CrossTransformers for Few-Shot Action Recognition
FrameExit: Conditional Early Exiting for Efficient Video Recognition
No frame left behind: Full Video Action Recognition
Learning Salient Boundary Feature for Anchor-free Temporal Action Localization
Temporal Context Aggregation Network for Temporal Action Proposal Refinement
ACTION-Net: Multipath Excitation for Action Recognition
Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning
TDN: Temporal Difference Networks for Efficient Action Recognition
A 3D GAN for Improved Large-pose Facial Recognition
MagFace: A Universal Representation for Face Recognition and Quality Assessment
WebFace260M: A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition
When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework
HLA-Face: Joint High-Low Adaptation for Low Light Face Detection
CRFace: Confidence Ranker for Model-Agnostic Face Detection Refinement
Cross Modal Focal Loss for RGBD Face Anti-Spoofing
Spatial-Phase Shallow Learning: Rethinking Face Forgery Detection in Frequency Domain
Multi-attentional Deepfake Detection
Continuous Face Aging via Self-estimated Residual Age Embedding
PML: Progressive Margin Loss for Long-tailed Age Classification
Affective Processes: stochastic modelling of temporal context for emotion and facial expression recognition
MagDR: Mask-guided Detection and Reconstruction for Defending Deepfakes
Differentiable Multi-Granularity Human Representation Learning for Instance-Aware Human Semantic Parsing
When Human Pose Estimation Meets Robustness: Adversarial Algorithms and Benchmarks
Pose Recognition with Cascade Transformers
Paper: https://arxiv.org/abs/2104.06976
Code: https://github.com/mlpc-ucsd/PRTR
DCPose: Deep Dual Consecutive Network for Human Pose Estimation
PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation
Paper(Oral): https://arxiv.org/abs/2105.02465
Code: https://github.com/jfzhang95/PoseAug
Camera-Space Hand Mesh Recovery via Semantic Aggregation and Adaptive 2D-1D Registration
Monocular 3D Multi-Person Pose Estimation by Integrating Top-Down and Bottom-Up Networks
HybrIK: A Hybrid Analytical-Neural Inverse Kinematics Solution for 3D Human Pose and Shape Estimation
From Synthetic to Real: Unsupervised Domain Adaptation for Animal Pose Estimation
POSEFusion: Pose-guided Selective Fusion for Single-view Human Volumetric Capture
Homepage: http://www.liuyebin.com/posefusion/posefusion.html
Paper(Oral): https://arxiv.org/abs/2103.15331
Code: None
Fourier Contour Embedding for Arbitrary-Shaped Text Detection
Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition
Checkerboard Context Model for Efficient Learned Image Compression
Slimmable Compressive Autoencoders for Practical Neural Image Compression
Attention-guided Image Compression by Deep Reconstruction of Compressive Sensed Saliency Skeleton
Teachers Do More Than Teach: Compressing Image-to-Image Models
Dynamic Slimmable Network
Network Quantization with Element-wise Gradient Scaling
Zero-shot Adversarial Quantization
Learnable Companding Quantization for Accurate Low-bit Neural Networks
Distilling Knowledge via Knowledge Review
Distilling Object Detectors via Decoupled Features
Towards Fast and Accurate Real-World Depth Super-Resolution: Benchmark Dataset and Baseline
ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic
AdderSR: Towards Energy Efficient Image Super-Resolution
Contrastive Learning for Compact Single Image Dehazing
Temporal Modulation Network for Controllable Space-Time Video Super-Resolution
Multi-Stage Progressive Image Restoration
PD-GAN: Probabilistic Diverse GAN for Image Inpainting
TransFill: Reference-guided Image Inpainting by Merging Multiple Color and Spatial Transformations
StyleMapGAN: Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing
High-Fidelity and Arbitrary Face Editing
Anycost GANs for Interactive Image Synthesis and Editing
PISE: Person Image Synthesis and Editing with Decoupled GAN
DeFLOCNet: Deep Image Editing via Flexible Low-level Controls
Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing
Towards Accurate Text-based Image Captioning with Content Diversity Exploration
DG-Font: Deformable Generative Networks for Unsupervised Font Generation
Paper: https://arxiv.org/abs/2104.03064
Code: https://github.com/ecnuycxie/DG-Font
LoFTR: Detector-Free Local Feature Matching with Transformers
Convolutional Hough Matching Networks
Bridging the Visual Gap: Wide-Range Image Blending
Paper: https://arxiv.org/abs/2103.15149
Code: https://github.com/julia0607/Wide-Range-Image-Blending
Robust Reflection Removal with Reflection-free Flash-only Cues
Equivariant Point Network for 3D Point Cloud Analysis
PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds
Back-tracing Representative Points for Voting-based 3D Object Detection in Point Clouds
HVPR: Hybrid Voxel-Point Representation for Single-stage 3D Object Detection
Homepage: https://cvlab.yonsei.ac.kr/projects/HVPR/
Paper: https://arxiv.org/abs/2104.00902
Code: https://github.com/cvlab-yonsei/HVPR
LiDAR R-CNN: An Efficient and Universal 3D Object Detector
M3DSSD: Monocular 3D Single Stage Object Detector
Paper: https://arxiv.org/abs/2103.13164
Code: https://github.com/mumianyuxin/M3DSSD
SE-SSD: Self-Ensembling Single-Stage Object Detector From Point Cloud
Center-based 3D Object Detection and Tracking
Categorical Depth Distribution Network for Monocular 3D Object Detection
Bidirectional Projection Network for Cross Dimension Scene Understanding
Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion
Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR Segmentation
Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges
Panoptic-PolarNet: Proposal-free LiDAR Point Cloud Panoptic Segmentation
Center-based 3D Object Detection and Tracking
ReAgent: Point Cloud Registration using Imitation and Reinforcement Learning
PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency
PREDATOR: Registration of 3D Point Clouds with Low Overlap
Unsupervised 3D Shape Completion through GAN Inversion
Variational Relational Point Completion Network
Style-based Point Generator with Adversarial Rendering for Point Cloud Completion
Homepage: https://alphapav.github.io/SpareNet/
Paper: https://arxiv.org/abs/2103.02535
Code: https://github.com/microsoft/SpareNet
Fully Understanding Generic Objects: Modeling, Segmentation, and Reconstruction
NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video
Homepage: https://zju3dv.github.io/neuralrecon/
Paper(Oral): https://arxiv.org/abs/2104.00681
Code: https://github.com/zju3dv/NeuralRecon
FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism
GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation
FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation
Back to the Feature: Learning Robust Camera Localization from Pixels to Pose
S2R-DepthNet: Learning a Generalizable Depth-specific Structural Representation
Beyond Image to Depth: Improving Depth Prediction using Echoes
S3: Learnable Sparse Signal Superdensity for Guided Depth Estimation
Depth from Camera Motion and Object Detection
A Decomposition Model for Stereo Matching
Self-Supervised Multi-Frame Monocular Scene Flow
RAFT-3D: Scene Flow using Rigid-Motion Embeddings
Learning Optical Flow From Still Images
Homepage: https://mattpoggi.github.io/projects/cvpr2021aleotti/
Paper: https://mattpoggi.github.io/assets/papers/aleotti2021cvpr.pdf
Code: https://github.com/mattpoggi/depthstillation
FESTA: Flow Estimation via Spatial-Temporal Attention for Scene Point Clouds
Keep your Eyes on the Lane: Real-time Attention-guided Lane Detection
Divide-and-Conquer for Lane-Aware Diverse Trajectory Prediction
Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark
Paper: https://arxiv.org/abs/2105.02440
Code: https://github.com/VisDrone/DroneCrowd
Dataset: https://github.com/VisDrone/DroneCrowd
Enhancing the Transferability of Adversarial Attacks through Variance Tuning
LiBRe: A Practical Bayesian Approach to Adversarial Detection
Natural Adversarial Examples
StyleMeUp: Towards Style-Agnostic Sketch-Based Image Retrieval
QAIR: Practical Query-efficient Black-Box Attacks for Image Retrieval
On Semantic Similarity in Video Retrieval
Paper: https://arxiv.org/abs/2103.10095
Homepage: https://mwray.github.io/SSVR/
Code: https://github.com/mwray/Semantic-Video-Retrieval
Cross-Modal Center Loss for 3D Cross-Modal Retrieval
Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers
Revamping cross-modal recipe retrieval with hierarchical Transformers and self-supervised learning
Paper: https://www.amazon.science/publications/revamping-cross-modal-recipe-retrieval-with-hierarchical-transformers-and-self-supervised-learning
Code: https://github.com/amzn/image-to-recipe-transformers
Counterfactual Zero-Shot and Open-Set Visual Recognition
FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space
CDFI: Compression-Driven Network Design for Frame Interpolation
FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation
Homepage: https://tarun005.github.io/FLAVR/
Paper: https://arxiv.org/abs/2012.08512
Code: https://github.com/tarun005/FLAVR
Transformation Driven Visual Reasoning
Taming Transformers for High-Resolution Image Synthesis
Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views of Novel Scenes
Self-Supervised Visibility Learning for Novel View Synthesis
NeX: Real-time View Synthesis with Neural Basis Expansion
Drafting and Revision: Laplacian Pyramid Network for Fast High-Quality Artistic Style Transfer
LayoutTransformer: Scene Layout Generation With Conceptual and Spatial Diversity
Variational Transformer Networks for Layout Generation
RobustNet: Improving Domain Generalization in Urban-Scene Segmentation via Instance Selective Whitening
Adaptive Methods for Real-World Domain Generalization
FSDR: Frequency Space Domain Randomization for Domain Generalization
Curriculum Graph Co-Teaching for Multi-Target Domain Adaptation
Domain Consensus Clustering for Universal Domain Adaptation
Towards Open World Object Detection
Exemplar-Based Open-Set Panoptic Segmentation Network
Learning Placeholders for Open-Set Recognition
IoU Attack: Towards Temporally Coherent Black-Box Adversarial Attack for Visual Object Tracking
HOTR: End-to-End Human-Object Interaction Detection with Transformers
Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information
Reformulating HOI Detection as Adaptive Set Prediction
Detecting Human-Object Interaction via Fabricated Compositional Learning
End-to-End Human Object Interaction Detection with HOI Transformer
Auto-Exposure Fusion for Single-Image Shadow Removal
Parser-Free Virtual Try-on via Distilling Appearance Flows
High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network
Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark
Paper: https://arxiv.org/abs/2105.02440
Code: https://github.com/VisDrone/DroneCrowd
Dataset: https://github.com/VisDrone/DroneCrowd
Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets
ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation
Learning To Count Everything
Semantic Image Matting
Towards Fast and Accurate Real-World Depth Super-Resolution: Benchmark Dataset and Baseline
Visual Semantic Role Labeling for Video Understanding
Homepage: https://vidsitu.org/
Paper: https://arxiv.org/abs/2104.00990
Code: https://github.com/TheShadow29/VidSitu
Dataset: https://github.com/TheShadow29/VidSitu
VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild
Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark
Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark
Homepage: https://vap.aau.dk/sewer-ml/
Paper: https://arxiv.org/abs/2103.10895
Nutrition5k: Towards Automatic Nutritional Understanding of Generic Food
Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges
When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework
Depth from Camera Motion and Object Detection
There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge
Scan2Cap: Context-aware Dense Captioning in RGB-D Scans
Paper: https://arxiv.org/abs/2012.02206
Code: https://github.com/daveredrum/Scan2Cap
Dataset: https://github.com/daveredrum/ScanRefer
There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge
Omnimatte: Associating Objects and Their Effects in Video
Homepage: https://omnimatte.github.io/
Paper(Oral): https://arxiv.org/abs/2105.06993
Code: https://omnimatte.github.io/#code
Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets
Motion Representations for Articulated Animation
Deep Lucas-Kanade Homography for Multimodal Image Alignment
Skip-Convolutions for Efficient Video Processing
KeypointDeformer: Unsupervised 3D Keypoint Discovery for Shape Control
Homepage: http://tomasjakab.github.io/KeypointDeformer
Paper(Oral): https://arxiv.org/abs/2104.11224
Code: https://github.com/tomasjakab/keypoint_deformer/
Learning To Count Everything
SOLD2: Self-supervised Occlusion-aware Line Description and Detection
Learning Probabilistic Ordinal Embeddings for Uncertainty-Aware Regression
LEAP: Learning Articulated Occupancy of People
Visual Semantic Role Labeling for Video Understanding
Homepage: https://vidsitu.org/
Paper: https://arxiv.org/abs/2104.00990
Code: https://github.com/TheShadow29/VidSitu
Dataset: https://github.com/TheShadow29/VidSitu
UAV-Human: A Large Benchmark for Human Behavior Understanding with Unmanned Aerial Vehicles
Video Prediction Recalling Long-term Motion Context via Memory Alignment Learning
Fully Understanding Generic Objects: Modeling, Segmentation, and Reconstruction
Towards High Fidelity Face Relighting with Realistic Shadows
BRepNet: A topological message passing system for solid models
Visually Informed Binaural Audio Generation without Binaural Audios
Homepage: https://sheldontsui.github.io/projects/PseudoBinaural
Paper: None
GitHub: https://github.com/SheldonTsui/PseudoBinaural_CVPR2021
Demo: https://www.youtube.com/watch?v=r-uC2MyAWQc
Exploring intermediate representation for monocular vehicle pose estimation
Tuning IR-cut Filter for Illumination-aware Spectral Reconstruction from RGB
Invertible Image Signal Processing
Video Rescaling Networks with Joint Optimization Strategies for Downscaling and Upscaling
SceneGraphFusion: Incremental 3D Scene Graph Prediction from RGB-D Sequences
Embedding Transfer with Label Relaxation for Improved Metric Learning
Picasso: A CUDA-based Library for Deep Learning over 3D Meshes
Meta-Mining Discriminative Samples for Kinship Verification
Cloud2Curve: Generation and Vectorization of Parametric Sketches
TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events
Abstract Spatial-Temporal Reasoning via Probabilistic Abduction and Execution
Homepage: http://wellyzhang.github.io/project/prae.html
Paper: https://arxiv.org/abs/2103.14230
Code: None
ACRE: Abstract Causal REasoning Beyond Covariation
Homepage: http://wellyzhang.github.io/project/acre.html
Paper: https://arxiv.org/abs/2103.14232
Code: None
Confluent Vessel Trees with Accurate Bifurcations
Few-Shot Human Motion Transfer by Personalized Geometry and Texture Modeling
Neural Parts: Learning Expressive 3D Shape Abstractions with Invertible Neural Networks
Knowledge Evolution in Neural Networks
Multi-institutional Collaborations for Improving Deep Learning-based Magnetic Resonance Image Reconstruction Using Federated Learning
SGP: Self-supervised Geometric Perception
Paper: https://arxiv.org/abs/2103.03114
Code: https://github.com/theNded/SGP
Multi-institutional Collaborations for Improving Deep Learning-based Magnetic Resonance Image Reconstruction Using Federated Learning
Diffusion Probabilistic Models for 3D Point Cloud Generation
Scan2Cap: Context-aware Dense Captioning in RGB-D Scans
Paper: https://arxiv.org/abs/2012.02206
Code: https://github.com/daveredrum/Scan2Cap
Dataset: https://github.com/daveredrum/ScanRefer
There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge
Paper: https://arxiv.org/abs/2103.01353
Code: http://rl.uni-freiburg.de/research/multimodal-distill
Dataset: http://rl.uni-freiburg.de/research/multimodal-distill
CT Film Recovery via Disentangling Geometric Deformation and Photometric Degradation: Simulated Datasets and Deep Models
Toward Explainable Reflection Removal with Distilling and Model Uncertainty
DeepOIS: Gyroscope-Guided Deep Optical Image Stabilizer Compensation
Exploring Adversarial Fake Images on Face Manifold
Uncertainty-Aware Semi-Supervised Crowd Counting via Consistency-Regularized Surrogate Task
Temporal Contrastive Graph for Self-supervised Video Representation Learning
Boosting Monocular Depth Estimation Models to High-Resolution via Context-Aware Patching
Fast and Memory-Efficient Compact Bilinear Pooling
Identification of Empty Shelves in Supermarkets using Domain-inspired Features with Structural Support Vector Machine
Estimating A Child’s Growth Potential From Cephalometric X-Ray Image via Morphology-Aware Interactive Keypoint Estimation
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。