赞
踩
https://github.com/tesseract-ocr/tesseract
下载安装包:tesseract-ocr-setup-4.00.00dev.exe
下载语言包: chi_sim.traineddata,eng.traineddata
(1)配置环境变量 TESSDATA_PREFIX=D:\tools\Tesseract-OCR\tessdata
(2)配置环境变量:path中添加 D:\tools\Tesseract-OCR
(2)将语言包放在安装目录 D:\tools\Tesseract-OCR\tessdata
测试图片如下:
再图片目录下,打开cmd窗口
tesseract test.jpg test -l chi_sim
如果翻译英文,则执行命令:
tesseract test.jpg test -l eng
结果:生成test.txt
引入依赖
<!-- https://mvnrepository.com/artifact/net.sourceforge.tess4j/tess4j -->
<dependency>
<groupId>net.sourceforge.tess4j</groupId>
<artifactId>tess4j</artifactId>
<version>4.5.4</version>
</dependency>
public static void main(String[] args) {
String imagePath = "C:\\Users\\x\\Desktop\\img\\test.jpg";
try {
BufferedImage textImage = ImageIO.read(new File(imagePath));
Tesseract instance = new Tesseract();
//设置语言包路径
instance.setDatapath("D:\\tools\\Tesseract-OCR\\tessdata");//设置训练库
//设置中文识别
instance.setLanguage("chi_sim");
String result = instance.doOCR(textImage);
System.out.println(result);
} catch (Exception e) {
}
}
结果:
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。