使用 Tess4J 实现本地与远程图片的文字识别

发布于:2024-04-25 ⋅ 阅读:(110) ⋅ 点赞:(0)

pom:

        <dependency>
            <groupId>net.sourceforge.tess4j</groupId>
            <artifactId>tess4j</artifactId>
            <version>5.11.0</version>
        </dependency>

部分代码:

package com.zy.datapickcli.sys.controller;

import net.sourceforge.tess4j.Tesseract;
import net.sourceforge.tess4j.TesseractException;

import java.io.File;

public class TempTest {
    public static void main(String[] args) throws TesseractException {
        File file = new File("D:\\1.png");
        System.out.println(recognizeText(file));
    }

    public static String recognizeText(File imageFile) throws TesseractException {
        Tesseract tesseract = new Tesseract();

        // 设定训练文件的位置(如果是标准英文识别,此步可省略)
        tesseract.setDatapath("D:\\tessdata");
        tesseract.setLanguage("chi_sim");
        return tesseract.doOCR(imageFile);
    }
}

data文件下载地址

https://gitcode.com/tesseract-ocr/tessdata/tree/main

其余参考代码:

@Service
public class OcrService {

    public String recognizeText(File imageFile) throws TesseractException {
        Tesseract tesseract = new Tesseract();
        
        // 设定训练文件的位置(如果是标准英文识别,此步可省略)
        tesseract.setDatapath("你的tessdata各语言集合包地址");
        tesseract.setLanguage("chi_sim");
        return tesseract.doOCR(imageFile);
    }

    public String recognizeTextFromUrl(String imageUrl) throws Exception {
        URL url = new URL(imageUrl);
        InputStream in = url.openStream();
        Files.copy(in, Paths.get("downloaded.jpg"), StandardCopyOption.REPLACE_EXISTING);

        File imageFile = new File("downloaded.jpg");
        return recognizeText(imageFile);
    }
}

执行效果:


网站公告

今日签到

点亮在社区的每一天
去签到