Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tess4j doesn't use it's tessdata folder

Tags:

java

tesseract

I am using tess4j, the java wrapper of Tesseract. I also have the normal Tesseract installed. I am not exactly sure how tess4j is meant to work, but since it comes with a tessdata folder, I can assume that you would put the language data files there. However, tess4j is only working if the language data files are in the "real" tessdata folder (the one that comes with tesseract, not tess4j). If I remove that folder, I get this error message:

Error opening data file C:\Program Files\Tesseract-OCR\tessdata/jpn.trained
data
Please make sure the TESSDATA_PREFIX environment variable is set to the par
ent directory of your "tessdata" directory.
Failed loading language 'jpn'
Tesseract couldn't load any languages!
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x631259dc, pid=5108, tid=
10148
#
# JRE version: 7.0_06-b24
# Java VM: Java HotSpot(TM) Client VM (23.2-b09 mixed mode, sharing windows
-x86 )
# Problematic frame:
# C  [libtesseract302.dll+0x59dc]  STRING::strdup+0x467c
#
# Failed to write core dump. Minidumps are not enabled by default on client
 versions of Windows
#
# An error report file with more information is saved as:
# D:\School\Programs\OCRTest\v1.0.0\hs_err_pid5108.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.sun.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

Does this mean I need to have Tesseract installed to use tess4j? Why? Or maby my tess4j tessdata folder is in the wrong place (It is currently with my .java files, the tess4j jars are in a lib folder to which I have set a classpath).

like image 540
Kiwi Bird Avatar asked Aug 07 '13 05:08

Kiwi Bird


4 Answers

For those that use maven and don't like to use global variables, this works for me:

File imageFile = new File("C:\\random.png");
Tesseract instance = Tesseract.getInstance();

//In case you don't have your own tessdata, let it also be extracted for you
File tessDataFolder = LoadLibs.extractTessResources("tessdata");

//Set the tessdata path
instance.setDatapath(tessDataFolder.getAbsolutePath());

try {
    String result = instance.doOCR(imageFile);
    System.out.println(result);
} catch (TesseractException e) {
    System.err.println(e.getMessage());
}

found here, tested with maven -> net.sourceforge.tess4j:tess4j:3.4.1, also the link use 1.4.1 jar

like image 82
cflorenciav Avatar answered Nov 18 '22 21:11

cflorenciav


Let your TESSDATA_PREFIX environment variable point to the tessdata folder of your Tess4j.

Usually you set up these variable during an installation on the system, but you maybe find a solution here: How do I set environment variables from Java?

You have to do it on the system which runs your app because the tessdata .dlls depend on this enviroment variable.

like image 4
sschrass Avatar answered Nov 18 '22 21:11

sschrass


TESSDATA_PREFIX environment variable, if defined, will overrule everything, including that is set by init or setDatapath; but that may change in the near future when an application can specify where its tessdata folder is.

http://code.google.com/p/tesseract-ocr/issues/detail?id=938
https://groups.google.com/forum/#!topic/tesseract-ocr/bkJwI8WmxSw

like image 3
nguyenq Avatar answered Nov 18 '22 21:11

nguyenq


Maybe you haven't the tessdata folder in your main project folder. This folder has all tesseract supported language (it contains files with .traineddata, .bigrams, .fold, .lm, .nn, .params, .size and .word-freq extensions) If you don't have it, follow these steps:

  1. Download tessdata-master folder from github.com/tesseract-ocr/tessdata (from download ZIP button)
  2. Unzip the content of tessdata-master.zip file in your main project folder
  3. Rename tessdata-master to tessdata
  4. Run your java project and test if it work. At least this works for me.
like image 2
José Mercado Avatar answered Nov 18 '22 21:11

José Mercado