I'm performing OCR using Tesseract 2.04 in some images, and now i've to get the precise position of the text ocearized. But this version don't return this information.
I need this to generate a searchable pdf file. I already learned how to stamp a text in a under layer of the pdf, but i need the position to stamp this text. My first idea is perform ocr in the pdf, getting the text and position of text, to stamp in the pdf with iText api.
Internally at iText we have also looked into OCR. And it is possible (using Tesseract).
workflow:
There are many more optimizations you could do. A short list of suggestions:
This is not an easy task. But certainly possible.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With