This is the original screenshot and I cropped the image into 4 parts and cleared the background of the image to the extent that I can possibly do but tesseract only detects the last column here and ignores the rest.
The output from the tesseract is shown as it is there are blank spaces which I remove while processing result
Femme—Fatale.
DaRkLoRdEIa
aChineseN1gg4
Noob_Diablo_
The output from the tesseract is shown as it is there are blank spaces which I remove while processing result
Kicked.
NosNoel
ChikiZD
Death_Eag|e_42
Chai—.
3579 10 1 7 148
2962 3 O 7 101
2214 2 2 7 99
2205 1 3 6 78
8212
7198
6307
5640
4884
15
40
40
6O
80
80
Am just dumping the output of
result = `pytesseract.image_to_string(Image.open("D:/newapproach/B&W"+str(i)+".jpg"),lang="New_Language")`
But I do not know how to proceed from here to get a consistent result.Is there anyway so that I can force the tesseract to recognize the text area and make it scan that.Because in trainer (SunnyPage), tesseract on default recognition scan it fails to recognize some areas but once I select the manually everything is detected and translated to text correctly
Tried with the command line which gives us option to decide which psm value to be used.
Can you try with this:
pytesseract.image_to_string(image, config='--psm 6')
Tried with the image provided by you and below is the result:
Extracted Text Out of Image
The only problem I am facing is that my tesseract dictionary is interpreting "1" provided in your image to ""I" .
Below is the list of psm options available:
pagesegmode values are: 0 = Orientation and script detection (OSD) only.
1 = Automatic page segmentation with OSD.
2 = Automatic page segmentation, but no OSD, or OCR
3 = Fully automatic page segmentation, but no OSD. (Default)
4 = Assume a single column of text of variable sizes.
5 = Assume a single uniform block of vertically aligned text.
6 = Assume a single uniform block of text.
7 = Treat the image as a single text line.
8 = Treat the image as a single word.
9 = Treat the image as a single word in a circle.
10 = Treat the image as a single character.
I used this link
https://www.howtoforge.com/tutorial/tesseract-ocr-installation-and-usage-on-ubuntu-16-04/
Just use below commands that may increase accuracy upto 50%`
sudo apt update
sudo apt install tesseract-ocr
sudo apt-get install tesseract-ocr-eng
sudo apt-get install tesseract-ocr-all
sudo apt install imagemagick
convert -h
tesseract [image_path] [file_name]
convert -resize 150% [input_file_path] [output_file_path]
convert [input_file_path] -type Grayscale [output_file_path]
tesseract [image_path] [file_name]
It will only show bold letters
Thanks
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With