I am using tess4j, the java wrapper of Tesseract. I also have the normal Tesseract installed. I am not exactly sure how tess4j is meant to work, but since it comes with a tessdata folder, I can assume that you would put the language data files there. However, tess4j is only working if the language data files are in the "real" tessdata folder (the one that comes with tesseract, not tess4j). If I remove that folder, I get this error message:
Error opening data file C:\Program Files\Tesseract-OCR\tessdata/jpn.trained
data
Please make sure the TESSDATA_PREFIX environment variable is set to the par
ent directory of your "tessdata" directory.
Failed loading language 'jpn'
Tesseract couldn't load any languages!
#
# A fatal error has been detected by the Java Runtime Environment:
#
# EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x631259dc, pid=5108, tid=
10148
#
# JRE version: 7.0_06-b24
# Java VM: Java HotSpot(TM) Client VM (23.2-b09 mixed mode, sharing windows
-x86 )
# Problematic frame:
# C [libtesseract302.dll+0x59dc] STRING::strdup+0x467c
#
# Failed to write core dump. Minidumps are not enabled by default on client
versions of Windows
#
# An error report file with more information is saved as:
# D:\School\Programs\OCRTest\v1.0.0\hs_err_pid5108.log
#
# If you would like to submit a bug report, please visit:
# http://bugreport.sun.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
Does this mean I need to have Tesseract installed to use tess4j? Why? Or maby my tess4j tessdata folder is in the wrong place (It is currently with my .java files, the tess4j jars are in a lib folder to which I have set a classpath).
For those that use maven and don't like to use global variables, this works for me:
File imageFile = new File("C:\\random.png");
Tesseract instance = Tesseract.getInstance();
//In case you don't have your own tessdata, let it also be extracted for you
File tessDataFolder = LoadLibs.extractTessResources("tessdata");
//Set the tessdata path
instance.setDatapath(tessDataFolder.getAbsolutePath());
try {
String result = instance.doOCR(imageFile);
System.out.println(result);
} catch (TesseractException e) {
System.err.println(e.getMessage());
}
found here, tested with maven -> net.sourceforge.tess4j:tess4j:3.4.1, also the link use 1.4.1 jar
Let your TESSDATA_PREFIX environment variable
point to the tessdata folder of your Tess4j.
Usually you set up these variable during an installation on the system, but you maybe find a solution here: How do I set environment variables from Java?
You have to do it on the system which runs your app because the tessdata .dll
s depend on this enviroment variable.
TESSDATA_PREFIX
environment variable, if defined, will overrule everything, including that is set by init
or setDatapath
; but that may change in the near future when an application can specify where its tessdata
folder is.
http://code.google.com/p/tesseract-ocr/issues/detail?id=938
https://groups.google.com/forum/#!topic/tesseract-ocr/bkJwI8WmxSw
Maybe you haven't the tessdata
folder in your main project folder.
This folder has all tesseract supported language (it contains files with .traineddata
, .bigrams
, .fold
, .lm
, .nn
, .params
, .size
and .word-freq
extensions)
If you don't have it, follow these steps:
tessdata-master.zip
file in your main project foldertessdata-master
to tessdata
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With