Highly inconsistent OCR result for tesseract

Question

enter image description here

This is the original screenshot and I cropped the image into 4 parts and cleared the background of the image to the extent that I can possibly do but tesseract only detects the last column here and ignores the rest.

enter image description here

The output from the tesseract is shown as it is there are blank spaces which I remove while processing result

  Femme—Fatale.



  DaRkLoRdEIa
  aChineseN1gg4

  Noob_Diablo_

enter image description here

The output from the tesseract is shown as it is there are blank spaces which I remove while processing result

Kicked.

NosNoel
ChikiZD
Death_Eag|e_42

Chai—.

enter image description here

3579 10 1 7 148

2962 3 O 7 101

2214 2 2 7 99

2205 1 3 6 78

enter image description here

Am just dumping the output of

result = `pytesseract.image_to_string(Image.open("D:/newapproach/B&W"+str(i)+".jpg"),lang="New_Language")`

But I do not know how to proceed from here to get a consistent result.Is there anyway so that I can force the tesseract to recognize the text area and make it scan that.Because in trainer (SunnyPage), tesseract on default recognition scan it fails to recognize some areas but once I select the manually everything is detected and translated to text correctly

Code

Manoj · Accepted Answer

Tried with the command line which gives us option to decide which psm value to be used.

Can you try with this:

pytesseract.image_to_string(image, config='--psm 6')

Tried with the image provided by you and below is the result:

Extracted Text Out of Image

The only problem I am facing is that my tesseract dictionary is interpreting "1" provided in your image to ""I" .

Below is the list of psm options available:

pagesegmode values are: 0 = Orientation and script detection (OSD) only.

1 = Automatic page segmentation with OSD.

2 = Automatic page segmentation, but no OSD, or OCR

3 = Fully automatic page segmentation, but no OSD. (Default)

4 = Assume a single column of text of variable sizes.

5 = Assume a single uniform block of vertically aligned text.

6 = Assume a single uniform block of text.

7 = Treat the image as a single text line.

8 = Treat the image as a single word.

9 = Treat the image as a single word in a circle.

10 = Treat the image as a single character.

akshat dashore · Answer

I used this link

https://www.howtoforge.com/tutorial/tesseract-ocr-installation-and-usage-on-ubuntu-16-04/

Just use below commands that may increase accuracy upto 50%`

sudo apt update

sudo apt install tesseract-ocr

sudo apt-get install tesseract-ocr-eng

sudo apt-get install tesseract-ocr-all

sudo apt install imagemagick

convert -h

tesseract [image_path] [file_name]

convert -resize 150% [input_file_path] [output_file_path]

convert [input_file_path] -type Grayscale [output_file_path]

tesseract [image_path] [file_name]

It will only show bold letters

Thanks

Highly inconsistent OCR result for tesseract

Tags:

python

opencv

python-tesseract

pytesser

Code

codefreaK

2 Answers

Manoj

akshat dashore

Recent Activity

Donate For Us

Highly inconsistent OCR result for tesseract

Tags:

python

opencv

python-tesseract

pytesser

Code

codefreaK

2 Answers

Manoj

akshat dashore

Related questions

Recent Activity

Donate For Us