I'm trying to train Tesseract in Windows and for that I need a pair tiff/box file and I'm trying to create it using jTessBoxEditor but it doesn't accept images as input. I've also tried boxFactory but it doesn't run properly. Does anyone know what is the best tool to create the pair from images?
Thanks
The image and box files aren’t being directly fed into the trainer. Instead, Tesseract works with the special *.lstmf files which combine images, boxes and text for each pair of *.tif and *.box. In order to generate those *.lstmf files you’ll need to run the following:
AFAIK, currently training is only supported with the synthetic box/tiff pairs generated via tesstrain.sh. See #768 for more details. Sorry, something went wrong. @Shreeshrii, I have change the format box files according to the requirements of tesseract 4.0 , namely I add a TAB at end of line and spaces to demarcate words for the box files.
After the tmp directory is created, copy box and tif to that dir. You should also give at least one font and training text as input, so that along with your box tiff will be used for training. Run the process, look at the log file, console output to verify that all files are being picked up.
In general, the training step of Tesseract is : 1 Merge training data to .tiff file using jTessBoxEditor 2 Create a training label, by creating a .box files containing predictions of the Tesseract from .tiff file and fix each inaccurate predictions 3 Train the tesseract
If you have jTessBoxEditor, then you have Tesseract bin files. Go to the tesseract-ocr subfolder of jTessBoxEditor and run the following command :
tesseract.exe D:\testocr\TestImage.tif D:\testocr\TestImage batch.nochop makebox
It should generate the file D:\testocr\TestImage.box. Then in jTessBoxEditor, go to Box Editor tab and open your image. The box file is automatically loaded, you can check if everything is ok and correct possible mistakes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With