I successfully wrote the traineddata file for a new tesseract language, but when I was finished, I continue to get the following error:
index >= 0 && index < size_used_:Error:Assert failed:in file ../ccutil/genericvector.h, line 657
However, this even happens when I run tesseract on an image I trained with! I am confused as to what is going on, as I would expect that the error should not occur if I run tesseract on the training set.
Go to https://github.com/tesseract-ocr/tesseract/releases and download the . zip file. 5. Next, go to https://github.com/tesseract-ocr/tessdata_best and select the language file(s) you need if you are working with non-English language material (see image below).
This error is being caused to the lack of a lang.shapetable
file in your lang.traineddata
file.
Make sure that you generate the shapetable:
shapeclustering -F font_properties -U unicharset lang.font.exp0.box.tr
This will create a file named shapetable
. You will need to rename this to lang.shapetable
before you can combine everything:
combine_tessdata lang.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With