当前位置: article > 正文

CVPR 2020文本图像检测与识别论文/代码

作者：我家自动化 | 2024-07-19 16:13:29

踩

图像分类论文及其代码

向AI转型的程序员都关注了这个号????????????

机器学习AI算法工程公众号：datayx

CVPR 2020 共收录 1470篇文章,算法主要领域：图像与视频处理，图像分类&检测&分割、视觉目标跟踪、视频内容分析、人体姿态估计、模型加速、网络架构搜索(NAS)、生成对抗(GAN)、光学字符识别(OCR)、人脸识别、三维重建等方向。

# 图像处理

1. Deep Image Harmonization via Domain Verification

论文：Deep Image Harmonization via Domain Verification

代码：bcmi/Image_Harmonization_Datasets

2. Learning to Shade Hand-drawn Sketches

论文：Learning to Shade Hand-drawn Sketches

3. Generalized ODIN: Detecting Out-of-distribution Image without Learning from Out-of-distribution Data

论文：Generalized ODIN: Detecting Out-of-distribution Image without Learning from Out-of-distribution Data

4. Single Image Reflection Removal through Cascaded Refinement

论文：arxiv.org/abs/1911.0663

5. RoutedFusion: Learning Real-time Depth Map Fusion

论文：arxiv.org/pdf/2001.0438

# 图像分类

1. Towards Robust Image Classification Using Sequential Attention Models

论文：Towards Robust Image Classification Using Sequential Attention Models

2. Self-training with Noisy Student improves ImageNet classification

论文：Self-training with Noisy Student improves ImageNet classification

3. Image Matching across Wide Baselines: From Paper to Practice

论文：Image Matching across Wide Baselines: From Paper to Practice

4. Improved Few-Shot Visual Classification

论文：arxiv.org/pdf/1912.0343

5. A General and Adaptive Robust Loss Function

论文：A General and Adaptive Robust Loss Function

6. Making Better Mistakes: Leveraging Class Hierarchies with Deep Networks

论文：Making Better Mistakes: Leveraging Class Hierarchies with Deep Networks

# 目标检测和分割

![](images.studyai.com/blog)

1. Few-Shot Object Detection with Attention-RPN and Multi-Relation Detector

论文：Few-Shot Object Detection with Attention-RPN and Multi-Relation Detector

2. Bridng the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection

论文：arxiv.org/abs/1912.0242

代码：sfzhang15/ATSS

3. Semi-Supervised Semantic Image Segmentation with Self-correcting Networks

论文：Semi-Supervised Semantic Image Segmentation with Self-correcting Networks

4. Deep Snake for Real-Time Instance Segmentation

论文：Deep Snake for Real-Time Instance Segmentation

5. SketchGCN: Semantic Sketch Segmentation with Graph Convolutional Networks

论文：SketchGCN: Semantic Sketch Segmentation with Graph Convolutional Networks

6. xMUDA: Cross-Modal Unsupervised Domain Adaptation for 3D Semantic Segmentation

论文：xMUDA: Cross-Modal Unsupervised Domain Adaptation for 3D Semantic Segmentation

7. CenterMask : Real-Time Anchor-Free Instance Segmentation

论文：CenterMask : Real-Time Anchor-Free Instance Segmentation

代码：youngwanLEE/CenterMask

8. PolarMask: Single Shot Instance Segmentation with Polar Representation

论文：PolarMask: Single Shot Instance Segmentation with Polar Representation

代码：xieenze/PolarMask

9. BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation

论文：BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation

# 视觉目标跟踪

![](images.studyai.com/blog)

1. ROAM: Recurrently Optimizing Tracking Model

论文：ROAM: Recurrently Optimizing Tracking Model

# 视频内容分析(理解)

![](images.studyai.com/blog)

1. Hierarchical Conditional Relation Networks for Video Question Answering

论文：Hierarchical Conditional Relation Networks for Video Question Answering

2. Rethinking Zero-shot Video Classification: End-to-end Training for Realistic Applications

论文：Rethinking Zero-shot Video Classification: End-to-end Training for Realistic Applications

代码：bbrattoli/ZeroShotVideoClassification

3. Action Modifiers:Learning from Adverbs in Instructional Video

论文：Action Modifiers: Learning from Adverbs in Instructional Videos

4. Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning

论文：Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning

5. Blurry Video Frame Interpolation

论文：Blurry Video Frame Interpolation

6. Object Relational Graph with Teacher-Recommended Learning for Video Captioning

论文：Object Relational Graph with Teacher-Recommended Learning for Video Captioning

7. Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs

论文：Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs

8. Learning Representations by Predicting Bags of Visual Words

论文：Learning Representations by Predicting Bags of Visual Words

9. Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution

论文：Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution

# 人体关键点检测和姿态估计

![](images.studyai.com/blog)

1. Distribution-Aware Coordinate Representation for Human Pose Estimation

论文：Distribution-Aware Coordinate Representation for Human Pose Estimation

代码：ilovepose/DarkPose

2. VIBE: Video Inference for Human Body Pose and Shape Estimation

论文：VIBE: Video Inference for Human Body Pose and Shape Estimation

代码：mkocabas/VIBE

3. The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation

论文：The Devil is in the Details: Delving into Unbiased Data Processing for Human Pose Estimation

4. Optimal least-squares solution to the hand-eye calibration problem

论文：Optimal least-squares solution to the hand-eye calibration problem

5. Distribution Aware Coordinate Representation for Human Pose Estimation

论文：Distribution-Aware Coordinate Representation for Human Pose Estimation

6. D3VO: Deep Depth, Deep Pose and Deep Uncertainty for Monocular Visual Odometry

论文：D3VO: Deep Depth, Deep Pose and Deep Uncertainty for Monocular Visual Odometry

7. Multi-Modal Domain Adaptation for Fine-Grained Action Recognition

论文：Multi-Modal Domain Adaptation for Fine-Grained Action Recognition

8. PVN3D: A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation

论文：arxiv.org/abs/1911.0423

9. 4D Association Graph for Realtime Multi-person Motion Capture Using Multiple Video Cameras

论文：4D Association Graph for Realtime Multi-person Motion Capture Using Multiple Video Cameras

# 模型轻量化和加速

1. GPU-Accelerated Mobile Multi-view Style Transfer

论文：GPU-Accelerated Mobile Multi-view Style Transfer

# 神经网络架构设计和搜索NAS

![](images.studyai.com/blog)

1. GhostNet: More Features from Cheap Operations

论文：GhostNet: More Features from Cheap Operations

代码：huawei-noah/ghostnet

2. CARS: Contunuous Evolution for Efficient Neural Architecture Search

论文：arxiv.org/pdf/1909.0497

代码：huawei-noah/CARS

3. Visual Commonsense R-CNN

论文：arxiv.org/abs/2002.1220

4. Watch your Up-Convolution: CNN Based Generative Deep Neural Networks are Failing to Reproduce Spectral

论文：Watch your Up-Convolution: CNN Based Generative Deep Neural Networks are Failing to Reproduce Spectral Distributions

5. AdderNet: Do We Really Need Multiplications in Deep Learning?

论文：arxiv.org/pdf/1912.1320

6. Filter Grafting for Deep Neural Networks

论文：arxiv.org/pdf/2001.0586

# 生成对抗GAN

![](images.studyai.com/blog)

1. Your Local GAN: Designing Two Dimensional Local Attention Mechanisms for Generative Models

论文：Your Local GAN: Designing Two Dimensional Local Attention Mechanisms for Generative Models

代码：giannisdaras/ylg

2. MSG-GAN: Multi-Scale Gradient GAN for Stable Image Synthesis

论文：MSG-GAN: Multi-Scale Gradient GAN for Stable Image Synthesis

3. Robust Design of Deep Neural Networks against Adversarial Attacks based on Lyapunov Theory

论文：Robust Design of Deep Neural Networks against Adversarial Attacks based on Lyapunov Theory

# 三维点云&3D重建

![](images.studyai.com/blog)

1. PointAugment: an Auto-Augmentation Framework for Point Cloud Classification

论文：PointAugment: an Auto-Augmentation Framework for Point Cloud Classification

代码：liruihui/PointAugment

2. PF-Net: Point Fractal Network for 3D Point Cloud Completion

论文：PF-Net: Point Fractal Network for 3D Point Cloud Completion

3. Learning multiview 3D point cloud registration

论文：Learning multiview 3D point cloud registration

4. Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes from a Single Image

论文：Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes from a Single Image

5. In Perfect Shape: Certifiably Optimal 3D Shape Reconstruction from 2D Landmarks

论文：arxiv.org/pdf/1911.1192

6. RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds

论文：RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds

7. C-Flow: Conditional Generative Flow Models for Images and 3D Point Clouds

论文：C-Flow: Conditional Generative Flow Models for Images and 3D Point Clouds

8. Representations, Metrics and Statistics For Shape Analysis of Elastic Graphs

论文：Representations, Metrics and Statistics For Shape Analysis of Elastic Graphs

9. Implicit Functions in Feature Space for 3D Shape Reconstruction and Completion

论文：Implicit Functions in Feature Space for 3D Shape Reconstruction and Completion

# 光学字符识别OCR

1. ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network

论文：ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network

代码：github.com/Yuliang-Liu/

# 迁移学习

![](images.studyai.com/blog)

1. Meta-Transfer Learning for Zero-Shot Super-Resolution

论文：Meta-Transfer Learning for Zero-Shot Super-Resolution

2. Transferring Dense Pose to Proximal Animal Classes

论文：Transferring Dense Pose to Proximal Animal Classes

# 弱监督 & 无监督学习

1. Unsupervised Reinforcement Learning of Transferable Meta-Skills for Embodied Navigation

论文：Unsupervised Reinforcement Learning of Transferable Meta-Skills for Embodied Navigation

2. Disentangling Physical Dynamics from Unknown Factors for Unsupervised Video Prediction

论文：Disentangling Physical Dynamics from Unknown Factors for Unsupervised Video Prediction

3. Rethinking the Route Towards Weakly Supervised Object Localization

论文：Rethinking the Route Towards Weakly Supervised Object Localization

4. NestedVAE: Isolating Common Factors via Weak Supervision

论文：NestedVAE: Isolating Common Factors via Weak Supervision

# 人脸识别

1. Towards Universal Representation Learning for Deep Face Recognition

论文：Towards Universal Representation Learning for Deep Face Recognition

2. Suppressing Uncertainties for Large-Scale Facial Expression Recognition

论文：Suppressing Uncertainties for Large-Scale Facial Expression Recognition

代码：kaiwang960112/Self-Cure-Network

3. Face X-ray for More General Face Forgery Detection

论文：arxiv.org/pdf/1912.1345

# 图神经网络GNN

1. Social-STGCNN: A Social Spatio-Temporal Graph Convolutional Neural Network for Human Trajectory Prediction

论文：Social-STGCNN: A Social Spatio-Temporal Graph Convolutional Neural Network for Human Trajectory Prediction

2. Bundle Adjustment on a Graph Processor

论文：Bundle Adjustment on a Graph Processor

代码：joeaortiz/gbp

# 视觉 & 语言混合任务研究

1. Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training

论文：Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training

代码：weituo12321/PREVALENT

2. 12-in-1: Multi-Task Vision and Language Representation Learning

论文：12-in-1: Multi-Task Vision and Language Representation Learning

3. Hierarchical Conditional Relation Networks for Video Question Answering

论文：Hierarchical Conditional Relation Networks for Video Question Answering

# 其他问题研究

1. What it Thinks is Important is Important: Robustness Transfers through Input Gradients

论文：arxiv.org/abs/1912.0569

2. Holistically-Attracted Wireframe Parsing

论文：Holistically-Attracted Wireframe Parsing

3. Attntive Context Normalization for Robust Permutation-Equivariant Learning

论文：Attentive Context Normalization for Robust Permutation-Equivariant Learning

5. ClusterFit: Improving Generalization of Visual Representations

论文：ClusterFit: Improving Generalization of Visual Representations

6. Learning in the Frequency Domain

论文：Learning in the Frequency Domain

7. A Characteristic Function Approach to Deep Implicit Generative Modeling

论文：A Characteristic Function Approach to Deep Implicit Generative Modeling

8. Auto-Encoding Twin-Bottleneck Hashing

论文：Auto-Encoding Twin-Bottleneck Hashing

CVPR 2020 所有文本图像（text）相关论文，主要分为手写文本和场景文本两大方向，总计16篇，对文献进行了细致的分类，大部分论文是围绕识别问题的研究。

方向包括：

1）场景文本检测（Scene Text Detection），从街景等场景文本中检测文本的位置，2 篇文献均为不规则任意形状文本的检测；

2）场景文本识别（Scene Text Recognition），对场景文本检测得到的结果进行识别，共 4 篇文章；

3）手写文本识别（Handwritten Text Recognition），2 篇文章；

4）场景文本端到端识别（Scene Text Spotting），1 篇文章，即华南理工大学和阿德莱德大学学者提出的实时 ABCNet 算法，很吸引人，已经开源；

5）手写文本生成（Handwritten Text Generation），为了增加手写文本的训练样本（感觉也可以用来“写作业”手动滑稽”），1 篇文章；

6）场景文本合成（Scene Text Synthesis），为了增加场景文本的训练样本，1 篇文章，出自旷视科技，UnrealText用渲染引擎生成逼真场景文本；

7）文本图像的数据增广，用于手写和场景文本识别算法的训练，1 篇文章；

8）场景文本编辑（Scene Text Editor），对场景文本图像中的文字进行替换；

9）碎纸文档重建，用于刑侦领域的文档被破坏成碎片后的重建，1篇；

10）文本风格迁移，1篇；

11）场景文本识别的对抗攻击研究，1篇；

12）笔迹鉴定，1篇。

值得一提的，16篇文章中10篇已经开源或者准备开源，感谢这些开发者～

已经开源或者即将开源的论文，把代码地址也附上了。

大家可以在：

http://openaccess.thecvf.com/CVPR2020.py

按照题目下载这些论文。

场景文本检测

深度关系推理图网络用于任意形状文本检测

[1].Deep Relational Reasoning Graph Network for Arbitrary Shape Text Detection

作者 | Shi-Xue Zhang, Xiaobin Zhu, Jie-Bo Hou, Chang Liu, Chun Yang, Hongfa Wang, Xu-Cheng Yin

单位 | 北京科技大学；中国科学技术大学人工智能联合实验室；腾讯科技（深圳）

代码 | https://github.com/GXYM/DRRG

解读 | https://blog.csdn.net/SpicyCoder/article/details/105072570

[2].ContourNet: Taking a Further Step Toward Accurate Arbitrary-Shaped Scene Text Detection

作者 | Yuxin Wang, Hongtao Xie, Zheng-Jun Zha, Mengting Xing, Zilong Fu, Yongdong Zhang

单位 | 中国科学技术大学

代码 | https://github.com/wangyuxin87/ContourNet

解读 | https://zhuanlan.zhihu.com/p/135399747

场景文本识别

论场景文本识别中的词汇依赖性

[3].On Vocabulary Reliance in Scene Text Recognition

作者 | Zhaoyi Wan, Jielei Zhang, Liang Zhang, Jiebo Luo, Cong Yao

单位 | 旷视；中国矿业大学；罗切斯特大学

[4].SCATTER: Selective Context Attentional Scene Text Recognizer

作者 | Ron Litman, Oron Anschel, Shahar Tsiper, Roee Litman, Shai Mazor, R. Manmatha

单位 | Amazon Web Services

语义推理网络，用于场景文本的精确识别

[5].Towards Accurate Scene Text Recognition With Semantic Reasoning Networks

作者 | Deli Yu, Xuan Li, Chengquan Zhang, Tao Liu, Junyu Han, Jingtuo Liu, Errui Ding

单位 | 国科大；百度；中科院

代码 | https://github.com/chenjun2hao/SRN.pytorch

语义增强的编解码框架，用于识别低质量图像（模糊、光照不均、字符不完整等）场景文本

[6].SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

作者 | Zhi Qiao, Yu Zhou, Dongbao Yang, Yucan Zhou, Weiping Wang

单位 | 中科院；国科大

代码 | https://github.com/Pay20Y/SEED（即将）

手写文本识别

[7].OrigamiNet: Weakly-Supervised, Segmentation-Free, One-Step, Full Page Text Recognition by learning to unfold

作者 | Mohamed Yousef, Tom E. Bishop

单位 | Intuition Machines, Inc

代码 | https://github.com/IntuitionMachines/OrigamiNet

Scene Text Spotting

实时端到端场景文本识别

[8].ABCNet: Real-Time Scene Text Spotting With Adaptive Bezier-Curve Network

作者 | Yuliang Liu, Hao Chen, Chunhua Shen, Tong He, Lianwen Jin, Liangwei Wang

单位 | 华南理工大学；阿德莱德大学；

代码 | https://github.com/Yuliang-Liu/bezier\_curve\_text\_spotting

备注 | CVPR 2020 Oral

解读 | https://zhuanlan.zhihu.com/p/146276834

手写文本生成

半监督变长手写文本生成，增加文本数据集，提高识别算法精度

[9].ScrabbleGAN: Semi-Supervised Varying Length Handwritten Text Generation

作者 | Sharon Fogel, Hadar Averbuch-Elor, Sarel Cohen, Shai Mazor, Roee Litman

单位 | 以色列国，Amazon Rekognition；康奈尔大学

代码 | https://github.com/amzn/convolutional-handwriting-gan

场景文本合成

使用渲染引擎合成场景文本，增加训练样本，提升识别算法精度

[10].UnrealText: Synthesizing Realistic Scene Text Images From the Unreal

作者 | WorldShangbang Long, Cong Yao

单位 | 卡内基梅隆大学；旷视

代码 | https://jyouhou.github.io/UnrealText/

解读 | https://zhuanlan.zhihu.com/p/137406773

数据增广+文本识别

图像增广用于手写与场景文本识别

[11].Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition

作者 | Canjie Luo, Yuanzhi Zhu, Lianwen Jin, Yongpan Wang

单位 | 华南理工大学；阿里

代码 | https://github.com/Canjie-Luo/Text-Image-Augmentation

场景文本编辑

[12].STEFANN: Scene Text Editor Using Font Adaptive Neural Network

作者 | Prasun Roy, Saumik Bhattacharya, Subhankar Ghosh, Umapada Pal

单位 | 印度统计研究所；印度理工学院

代码 | https://github.com/prasunroy/stefann

网站 | https://prasunroy.github.io/stefann/

碎纸文档重建

破碎纸片重建文档，用于法医等刑侦调查

[13].Fast(er) Reconstruction of Shredded Text Documents via Self-Supervised Deep Asymmetric Metric Learning

作者 | Thiago M. Paixao, Rodrigo F. Berriel, Maria C. S. Boeres, Alessandro L. Koerich, Claudine Badue, Alberto F. De Souza, Thiago Oliveira-Santos

单位 | IFES，Brazil；UFES，Brazil；ETS，Canada

文本风格迁移

[14].SwapText: Image Based Texts Transfer in Scenes

作者 | Qiangpeng Yang, Jun Huang, Wei Lin

单位 | 阿里

场景文本识别+对抗攻击

[15].What Machines See Is Not What They Get: Fooling Scene Text Recognition Models With Adversarial Text Images

作者 | Xing Xu, Jiefu Chen, Jinhui Xiao, Lianli Gao, Fumin Shen, Heng Tao Shen

单位 | 电子科技大学

笔迹鉴定

[16].Sequential Motif Profiles and Topological Plots for Offline Signature Verification

作者 | Elias N. Zois, Evangelos Zervas, Dimitrios Tsourounis, George Economou

单位 | University of West Attica ；派图拉斯大学

阅读过本文的人还看了以下文章：

基于40万表格数据集TableBank，用MaskRCNN做表格检测

《基于深度学习的自然语言处理》中/英PDF

Deep Learning 中文版初版-周志华团队

【全套视频课】最全的目标检测算法系列讲解，通俗易懂！

《美团机器学习实践》_美团算法团队.pdf

《深度学习入门：基于Python的理论与实现》高清中文PDF+源码

特征提取与图像处理(第二版).pdf

python就业班学习视频，从入门到实战项目

2019最新《PyTorch自然语言处理》英、中文版PDF+源码

《21个项目玩转深度学习：基于TensorFlow的实践详解》完整版PDF+附书代码

《深度学习之pytorch》pdf+附书源码

PyTorch深度学习快速实战入门《pytorch-handbook》

【下载】豆瓣评分8.1,《机器学习实战:基于Scikit-Learn和TensorFlow》

《Python数据分析与挖掘实战》PDF+完整源码

汽车行业完整知识图谱项目实战视频(全23课)

李沐大神开源《动手学深度学习》，加州伯克利深度学习（2019春）教材

笔记、代码清晰易懂！李航《统计学习方法》最新资源全套！

《神经网络与深度学习》最新2018版中英PDF+源码

将机器学习模型部署为REST API

FashionAI服装属性标签图像识别Top1-5方案分享

重要开源！CNN-RNN-CTC 实现手写汉字识别

yolo3 检测出图像中的不规则汉字

同样是机器学习算法工程师，你的面试为什么过不了？

前海征信大数据算法：风险概率预测

【Keras】完整实现‘交通标志’分类、‘票据’分类两个项目，让你掌握深度学习图像分类

VGG16迁移学习，实现医学图像识别分类工程项目

特征工程(一)

特征工程(二) :文本数据的展开、过滤和分块

特征工程(三):特征缩放,从词袋到 TF-IDF

特征工程(四): 类别特征

特征工程(五): PCA 降维

特征工程(六): 非线性特征提取和模型堆叠

特征工程(七)：图像特征提取和深度学习

如何利用全新的决策树集成级联结构gcForest做特征工程并打分？

Machine Learning Yearning 中文翻译稿

蚂蚁金服2018秋招-算法工程师（共四面）通过

全球AI挑战-场景分类的比赛源码(多模型融合)

斯坦福CS230官方指南：CNN、RNN及使用技巧速查（打印收藏）

python+flask搭建CNN在线识别手写中文网站

中科院Kaggle全球文本匹配竞赛华人第1名团队-深度学习与特征工程

不断更新资源

深度学习、机器学习、数据分析、python

搜索公众号添加： datayx

机大数据技术与机器学习工程

搜索公众号添加： datanlp

长按图片，识别二维码

声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/w/我家自动化/article/detail/852388

CVPR 2020文本图像检测与识别论文/代码

# 图像处理

# 图像分类

# 目标检测和分割

# 视觉目标跟踪

# 视频内容分析(理解)

# 人体关键点检测和姿态估计

# 模型轻量化和加速

# 生成对抗GAN

# 三维点云&3D重建

# 光学字符识别OCR

# 迁移学习

# 弱监督 & 无监督学习

# 人脸识别

# 图神经网络GNN

# 视觉 & 语言 混合任务研究

# 其他问题研究

# 视觉 & 语言混合任务研究