当前位置:   article > 正文

python批量识别图片中文字_Attention-OCR:实现识别图片中文字的网络

most prior works that use spatial attention for ocr (e.g., [1], [16]–[20])

Attention-OCR

Bidirectional LSTM encoder and attention-enhanced GRU decoder stacked on a multilayer CNN for image-to-transcription.

Please cite the paper if you use this code for academic research:

@article{DBLP:journals/corr/abs-1712-04046,

author = {Jason Poulos and

Rafael Valle},

title = {Character-Based Handwritten Text Transcription with Attention Networks},

journal = {CoRR},

volume = {abs/1712.04046},

year = {2017},

url = {http://arxiv.org/abs/1712.04046},

archivePrefix = {arXiv},

eprint = {1712.04046},

timestamp = {Mon, 13 Aug 2018 16:47:16 +0200},

biburl = {https://dblp.org/rec/bib/journals/corr/abs-1712-04046},

bibsource = {dblp computer science bibliography, https://dblp.org}

}

Acknowledgements

IAM image and transcription preprocessing from Laia.

Prerequsites

Python 3 (tested on Python 3.6.6)

Tensorflow 1 (tested on 1.13.1)

Required packages: {distance, tqdm, pillow, matplotlib, imgaug}:

pip3 install {package}

Image-to-transcription on IAM:

Data Preparation

Follow steps for IAM data preparation. IAM consists of ~10k images of handwritten text lines and their transcriptions. The code in the linked repo binarizes the images in a manner that preserves the original grayscale information, converts to JPEG, and scales to 64 pixel height. The code creates a folder for preprocessed images imgs_proc and transcriptions htr/lang/char.

a01-000u-00.pnga01-000u-00.jpg

Create a file lines_train.txt from the transcription tr.txt that replaces whitespace with a vertical pipe and contains the path of images and the corresponding characters, e.g.:

./imgs_proc/a01-000u-00.jpg A|MOVE|to|stop|Mr.|Gaitskell|from

./imgs_proc/a01-000u-01.jpg nominating|any|more|Labour|life|Peers

./imgs_proc/a01-000u-02.jpg is|to|be|made|at|a|meeting|of|Labour

Also create files lines_val.txt and lines_test.txt from htr/lang/word/va.txt and htr/lang/word/te.txt, respectively, following the same format as above.

Create a file src/labels/target_vocab.txt with all unique target characters in the training set, with the number of characters equal to the value of target-vocab-size, e.g.:

!

&

0

1

:

;

A

B

a

b

|

Assume that the working directory is Attention-OCR. The data files within Attention-OCR should have the structure:

src

labels

target_vocab.txt

iamdb

imgs_proc (folder of JPEG images)

lines_train.txt

lines_val.txt

lines_test.txt

Train

python3 src/launcher.py \

--phase=train \

--data-path=lines_train.txt \

--data-base-dir=iamdb \

--model-dir=model_iamdb_softmax \

--log-path=log_iamdb_train_softmax.txt \

--reg-val=0.5 \

--attn-num-hidden=256 \

--attn-num-layers=2 \

--batch-size=4 \

--num-epoch=150 \

--steps-per-checkpoint=500 \

--opt-attn=softmax \

--target-embedding-size=5 \

--target-vocab-size=79 \

--initial-learning-rate=0.0001 \

--augmentation=0.1 \

--gpu-id=0 \

--no-load-model

You will see something like the following output in log_iamdb_train.txt:

...

09:22:22,993 root INFO Created model with fresh parameters.

2020-02-18 09:22:59,658 root INFO Generating first batch

2020-02-18 09:23:03,393 root INFO current_step: 0

2020-02-18 09:24:33,511 root INFO step 0.000000 - time: 90.118267, loss: 4.375765, perplexity: 79.500660, precision: 0.020499, CER: 0.979798, batch_len: 469.000000

2020-02-18 09:24:34,033 root INFO current_step: 1

2020-02-18 09:24:34,677 root INFO step 1.000000 - time: 0.644488, loss: 4.364702, perplexity: 78.625946, precision: 0.013305, CER: 0.986486, batch_len: 301.000000

2020-02-18 09:24:35,224 root INFO current_step: 2

2020-02-18 09:24:35,955 root INFO step 2.000000 - time: 0.731375, loss: 4.341702, perplexity: 76.838169, precision: 0.114527, CER: 0.889571, batch_len: 613.000000

2020-02-18 09:24:36,010 root INFO current_step: 3

2020-02-18 09:24:36,721 root INFO step 3.000000 - time: 0.713290, loss: 4.327676, perplexity: 75.768019, precision: 0.169855, CER: 0.830409, batch_len: 516.000000

2020-02-18 09:24:36,824 root INFO current_step: 4

2020-02-18 09:24:37,508 root INFO step 4.000000 - time: 0.686172, loss: 4.304539, perplexity: 74.035057, precision: 0.165195, CER: 0.836158, batch_len: 457.000000

2020-02-18 09:24:37,706 root INFO current_step: 5

2020-02-18 09:24:38,399 root INFO step 5.000000 - time: 0.694256, loss: 4.264017, perplexity: 71.095007, precision: 0.192181, CER: 0.805128, batch_len: 481.000000

Model checkpoints saved in model_iamdb_softmax.

Test model and visualize attention

We provide a trained model on IAM:

wget https://www.dropbox.com/s/ujxeahr1voo0sl8/model_iamdb_softmax.tar.gz

tar -xvzf model_iamdb_softmax.tar.gz

python3 src/launcher.py \

--phase=test \

--visualize \

--data-path=lines_test.txt \

--data-base-dir=iamdb \

--model-dir=model_iamdb_softmax \

--log-path=log_iamdb_test.txt \

--attn-num-hidden=256 \

--attn-num-layers=2 \

--opt-attn=softmax \

--target-embedding-size=5 \

--target-vocab-size=79 \

--gpu-id=0 \

--load-model \

--output-dir=softmax_results

You will see something like the following output in log_iamdb_val.txt:

2017-05-04 20:06:32,116 root INFO Reading model parameters from model_iamdb_softmax/translate.ckpt-731000

2017-05-04 20:09:54,266 root INFO Compare word based on edit distance.

2017-05-04 20:09:57,299 root INFO step_time: 2.684323, loss: 12.952633, step perplexity: 421946.118697

2017-05-04 20:10:10,894 root INFO 0.489362 out of 1 correct

2017-05-04 20:10:11,710 root INFO step_time: 0.779765, loss: 16.425102, step perplexity: 13593499.165457

2017-05-04 20:10:22,828 root INFO 0.771970 out of 2 correct

2017-05-04 20:10:23,627 root INFO step_time: 0.776458, loss: 20.803520, step perplexity: 1083562653.786069

2017-05-04 20:10:47,098 root INFO 1.423133 out of 3 correct

2017-05-04 20:10:48,040 root INFO step_time: 0.918638, loss: 11.657264, step perplexity: 115527.486132

2017-05-04 20:11:04,398 root INFO 2.246663 out of 4 correct

2017-05-04 20:11:07,883 root INFO step_time: 3.448558, loss: 10.126567, step perplexity: 24998.394628

2017-05-04 20:11:25,554 root INFO 2.483505 out of 5 correct

Output images in softmax_results (the output directory is set via parameter output-dir and the default is results). This example visualizes attention on an image:

d01-052-00.gif

This example plots the attention alignment over an image:

att_mat.png

Parameters:

Default parameters set in the file src/exp_config.py.

Control

GPU-ID: ID number of the GPU.

phase: Determine whether to 'train' or 'test'. Default is 'test'.

visualize: Valid if phase is set to test. Output the attention maps on the original image. Set flag to no-visualize to test without visualizing.

load-model: Load model from model-dir or not.

target-vocab-size: Target vocabulary size. Default is = 26+10+3 # 0: PADDING, 1: GO, 2: EOS, >2: 0-9, a-z

Input and output

data-base-dir: The base directory of the image path in data-path. If the image path in data-path is absolute path, set it to /.

data-path: The path containing data file names and labels. Format per line: image_path characters.

model-dir: The directory for saving and loading model parameters (structure is not stored). Default is 'train'.

log-path: The path to put log. Default is 'log.txt'

output-dir: The path to put visualization results if visualize is set to True. Default is 'results'.

steps-per-checkpoint: Checkpointing (print perplexity, save model) per how many steps. Default is 500.

augmentation: P(data augmentation). Default is 0.2.

Optimization

num-epoch: The number of whole data passes. Default is 1000.

batch-size: Batch size. Only valid if phase is set to train. Default is 64.

initial-learning-rate: Initial (AdaDelta) learning rate. Default is 1.

Network

reg-val: Lambda for L2 regularization losses. Default is 0.

clip-gradients: Whether to perform gradient clipping. Default is 'True'.

max-gradient-norm: Clip gradients to this norm. Default is 5.

target-embedding-size: Embedding dimension for each target. Default is 10.

opt-attn: Which attention mechanism to use: 'softmax' (default); 'log_softmax'; 'sigmoid'; 'no_attn'.

use-gru: Use GRU for decoder (rather than LSTM). Default is 'True'.

attn-num-hidden: Number of hidden units in attention decoder cell. Default is 128.

attn-num-layers: Number of layers in attention decoder cell. Default is 2. (Encoder number of hidden units will be attn-num-hidden*attn-num-layers).

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/一键难忘520/article/detail/799775
推荐阅读
相关标签
  

闽ICP备14008679号