I had an image file, which contain some text separated by tabs (2 spaces). But when I extract text out of this image file, I always get a single space between two columns. A sample example:
IMAGE:
col-a col-b col-c
Desired output:
col-a col-b col-c
But I am getting the following:
col-a col-b col-c
I am using pytesseract.image_to_string (Python module) convert image to text
Pytesseract or Python-tesseract is an OCR tool for python that also serves as a wrapper for the Tesseract-OCR Engine. It can read and recognize text in images and is commonly used in python ocr image to text use cases.
Optical Character Recognition (OCR) is a technology that is used to recognize text from images. It can be used to convert tight handwritten or printed texts into machine-readable texts. To use OCR, you need to install and configure tesseract on your computer. First, download the Tesseract OCR executables here.
Project descriptionPython-tesseract is an optical character recognition (OCR) tool for python. That is, it will recognize and “read” the text embedded in images. Python-tesseract is a wrapper for Google's Tesseract-OCR Engine.
Tesseract — is an optical character recognition engine with open-source code, this is the most popular and qualitative OCR-library. OCR uses artificial intelligence for text search and its recognition on images. Tesseract is finding templates in pixels, letters, words and sentences.
Use it like this:
pytesseract.image_to_string(img, config='-c preserve_interword_spaces=1')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With