python文本数据增强_用于场景文本图像数据增强的工具

Text Image AugmentationA general geometric augmentation tool for text images in the CVPR 2020 paper "Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition". We provid

weixin_39766109

290人浏览 · 2020-12-08 17:33:06

weixin_39766109 · 2020-12-08 17:33:06 发布

Text Image Augmentation

A general geometric augmentation tool for text images in the CVPR 2020 paper "Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition". We provide the tool to avoid overfitting and gain robustness of text recognizers.

Note that this is a general toolkit. Please customize for your specific task. If the repo benefits your work, please cite the papers.

News

2020-02 The paper "Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition" was accepted to CVPR 2020. It is a preliminary attempt for smart augmentation.

2019-11 The paper "Decoupled Attention Network for Text Recognition" (Paper Code) was accepted to AAAI 2020. This augmentation tool was used in the experiments of handwritten text recognition.

2019-04 We applied this tool in the ReCTS competition of ICDAR 2019. Our ensemble model won the championship.

2019-01 The similarity transformation was specifically customized for geomeric augmentation of text images.

Requirements

We recommend Anaconda to manage the version of your dependencies. For example:

conda install boost=1.67.0

Installation

Build library:

mkdir build

cd build

cmake -D CUDA_USE_STATIC_CUDA_RUNTIME=OFF ..

make

Copy the Augment.so to the target folder and follow demo.py to use the tool.

cp Augment.so ..

cd ..

python demo.py

Demo

Distortion

Stretch

Perspective

Speed

To transform an image with size (H:64, W:200), it takes less than 3ms using a 2.0GHz CPU. It is possible to accelerate the process by calling multi-process batch samplers in an on-the-fly manner, such as setting "num_workers" in PyTorch.

Improvement for Recognition

We compare the accuracies of CRNN trained using only the corresponding small training set.

Dataset

IIIT5K

IC13

IC15

Without Data Augmentation

40.8%

6.8%

8.7%

With Data Augmentation

53.4%

9.6%

24.9%

Citation

@inproceedings{luo2020learn,

author = {Canjie Luo and Yuanzhi Zhu and Lianwen Jin and Yongpan Wang},

title = {Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition},

booktitle = {CVPR},

year = {2020}

}

@inproceedings{wang2020decoupled,

author = {Tianwei Wang and Yuanzhi Zhu and Lianwen Jin and Canjie Luo and Xiaoxue Chen and Yaqiang Wu and Qianying Wang and Mingxiang Cai},

title = {Decoupled attention network for text recognition},

booktitle ={AAAI},

year = {2020}

}

@article{schaefer2006image,

title={Image deformation using moving least squares},

author={Schaefer, Scott and McPhail, Travis and Warren, Joe},

journal={ACM Transactions on Graphics (TOG)},

volume={25},

number={3},

pages={533--540},

year={2006},

publisher={ACM New York, NY, USA}

}

Acknowledgment

Thanks for the contribution of the following developers.

Attention

The tool is only free for academic research purposes.

魔乐社区

魔乐社区（Modelers.cn) 是一个中立、公益的人工智能社区，提供人工智能工具、模型、数据的托管、展示与应用协同服务，为人工智能开发及爱好者搭建开放的学习交流平台。社区通过理事会方式运作，由全产业链共同建设、共同运营、共同享有，推动国产AI生态繁荣发展。

更多推荐

替你试过了，消费级显卡可以跑的开源文生图SOTA模型，顶级渲染、高密度文本绘图

魔乐社区

量化挑战赛冠军专访：4小时啃下W4A8量化，我靠的是这些经验

魔乐社区

小参数・大码力・易部署 | Qwen3.6-27B上线魔乐社区，基于昇腾的部署教程来了

继一周前模型开源发布后，千问再度开源Qwen3.6-27B —— 一个拥有270亿参数的稠密多模态模型，也是社区呼声最高的模型规格。Qwen3.6-27B 依然支持多模态思考与非思考模式，在智能体编程方面达到了旗舰级表现，全面超越前代开源旗舰 Qwen3.5-397B-A17B（总参数397B / 激活参数17B的MoE模型）。作为稠密架构，它无需MoE路由即可部署，是开发者在实用、可广泛部署规模