tessdata java_maven - Tesseract For Java setting Tessdata_Prefix for executable jar
The ultimate goal of this project is to take the jar and put it in a directory where it uses tesseract and outputs a results directory and the output txt file. I am having some issues with tesseract,
The ultimate goal of this project is to take the jar and put it in a directory where it uses tesseract and outputs a results directory and the output txt file. I am having some issues with tesseract, though. I am working with tess4j in Java with Maven and I want to make my code into an executable jar. The project works fine as a desktop app but whenever i try to run using java -jar fileName.jar(after exporting to a jar) it gives me the error
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory
Failed loading language 'eng'
...
I looked online and couldnt really find out how to set up tesseract for a jar and get the paths right. Now I use maven and have the Tesseract dependency in my pom file (tess4j -v 3.0) and I have the tessdata in my project.
I am fairly new to maven and jar files and have never used tesseract before, but as far as i can tell from the internet I set it up correctly.
Does anyone know how to make tess4j point to the tessdata directory in my project and have a dynamic path so i can move use it on multiple computers and places?
This is how I call Tesseract
Tesseract instance = new Tesseract();
instance.setDatapath("src/main/resources");
String result = instance.doOCR(imageFile);
String fileName = imageFile.getName().replace(".jpg", "");
System.out.println("Parsed Image " + fileName);
return result;
EDIT
This is how I tried to set the environment variable TESSDATA_PREFIX in my code
String dir = System.getProperty("user.dir");
System.out.println("current dir = " + dir);
ProcessBuilder pb = new ProcessBuilder("CMD", "/C", "SET");
Map env = pb.environment();
env.put("TESSDATA_PREFIX", dir + "\\tessdata");
Process p = pb.start();
but this had no discernible effect. I still got the same error
EDIT 2
According to the error message I need to set it to the parent dir of the tessdata, I also tried this to no avail
EDIT 3
After a ton of searching and trying to fix it, I am not sure it is even possible. The doOcr method in tesseract takes in a buffered image or File, which would be alright if my images weren't dynamic so I cant really store them in the jar. Not to mention the fact that the TESSDATA_PREFIX still wont set. If anyone has any ideas i am all ears still and I will keep looking for a solution but im not sure it will work at all
java
maven
tesseract
executable-jar
tess4j
edited Mar 24 '16 at 15:47 asked Mar 22 '16 at 22:07
Ian 182 1 14 Is that might help ?
stackoverflow.com/questions/18095708/… –
Shmulik Klein Mar 22 '16 at 23:41 @ShmulikKlein Nope, didnt work for me. Ill add an edit with how i set the environment variables. I got the same error –
Ian Mar 23 '16 at 15:20 So the problem is that I have the tessdata in my project hierarchy. I cant really pull this out because a system may not have it so I need to find a way to still load the tessdata while having it be executable –
Ian Mar 23 '16 at 16:01
|
2 Answers
2
You can invoke instance.setDatapath method to point Tesseract to the location of your tessdata folder.
answered Mar 24 '16 at 1:35
nguyenq 5,644 1 9 12 yeah, i already do that. The problem is that jars dont have a "folder" –
Ian Mar 24 '16 at 13:40 I added an edit in the question to show this –
Ian Mar 24 '16 at 13:51 If you packaged
tessdata in your JAR file, you'd need to extract it first to the local filesystem and set data path to that. –
nguyenq Mar 24 '16 at 23:19 how would i go about doing that? –
Ian Mar 25 '16 at 13:40 See
stackoverflow.com/questions/17745788/… or
stackoverflow.com/questions/11472408/… –
nguyenq Mar 25 '16 at 13:56
|
It randomly started working when I
put the tessdata folder in the same directory as my jar
changed the setDatapath to the following Tesseract instance = new Tesseract();
instance.setDatapath(".");
String result = instance.doOCR(imageFile);
String fileName = imageFile.getName().replace(".jpg", "");
System.out.println("Parsed Image " + fileName);
return result;
and 3. I exported from eclipse by right clicking the project, selecting java -> runnable jar, then setting the option "Extract Required Libraries into Generated Jars".
(side note, the environment setting like I was doing early does not need to be in the project anymore)
I really thought I tried this but i guess something must have been wrong. I removed tessdata from my project and will have to include that wherever the jar is run. Im not really sure why it started working but im glad it did
answered Mar 24 '16 at 20:21
Ian 182 1 14
|
魔乐社区(Modelers.cn) 是一个中立、公益的人工智能社区,提供人工智能工具、模型、数据的托管、展示与应用协同服务,为人工智能开发及爱好者搭建开放的学习交流平台。社区通过理事会方式运作,由全产业链共同建设、共同运营、共同享有,推动国产AI生态繁荣发展。
更多推荐


所有评论(0)