Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to generate a tiff/box file from an image to train Tesseract in Windows

I'm trying to train Tesseract in Windows and for that I need a pair tiff/box file and I'm trying to create it using jTessBoxEditor but it doesn't accept images as input. I've also tried boxFactory but it doesn't run properly. Does anyone know what is the best tool to create the pair from images?

Thanks

like image 941
greenlasagna Avatar asked Jul 31 '15 16:07

greenlasagna


People also ask

Why are the image and box files not being fed into Tesseract?

The image and box files aren’t being directly fed into the trainer. Instead, Tesseract works with the special *.lstmf files which combine images, boxes and text for each pair of *.tif and *.box. In order to generate those *.lstmf files you’ll need to run the following:

Is tesseract training supported with synthetic box/TIFF files?

AFAIK, currently training is only supported with the synthetic box/tiff pairs generated via tesstrain.sh. See #768 for more details. Sorry, something went wrong. @Shreeshrii, I have change the format box files according to the requirements of tesseract 4.0 , namely I add a TAB at end of line and spaces to demarcate words for the box files.

How to train a box with Tiff?

After the tmp directory is created, copy box and tif to that dir. You should also give at least one font and training text as input, so that along with your box tiff will be used for training. Run the process, look at the log file, console output to verify that all files are being picked up.

How to train the Tesseract?

In general, the training step of Tesseract is : 1 Merge training data to .tiff file using jTessBoxEditor 2 Create a training label, by creating a .box files containing predictions of the Tesseract from .tiff file and fix each inaccurate predictions 3 Train the tesseract


1 Answers

If you have jTessBoxEditor, then you have Tesseract bin files. Go to the tesseract-ocr subfolder of jTessBoxEditor and run the following command :

tesseract.exe D:\testocr\TestImage.tif D:\testocr\TestImage batch.nochop makebox

It should generate the file D:\testocr\TestImage.box. Then in jTessBoxEditor, go to Box Editor tab and open your image. The box file is automatically loaded, you can check if everything is ok and correct possible mistakes.

like image 109
darkpotpot Avatar answered Oct 19 '22 09:10

darkpotpot