Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extracting code from photograph of T-shirt via OCR

I recently saw someone with a T-shirt with some Perl code on the back. I took a photograph of it and cropped out the code:

alt text

Next I tried to extract the code from the image via OCR, so I installed Tesseract OCR and the Python bindings for it, pytesser.

Pytesser only works on TIFF images, so I converted the image in Gimp and entered the following code (Ubuntu 9.10):

>>> from pytesser import * >>> image = Image.open('code.tif') >>> print image_to_string(image) Traceback (most recent call last):   File "<stdin>", line 1, in <module>   File "pytesser.py", line 30, in image_to_string     util.image_to_scratch(im, scratch_image_name)   File "util.py", line 7, in image_to_scratch     im.save(scratch_image_name, dpi=(200,200))   File "/usr/lib/python2.6/dist-packages/PIL/Image.py", line 1406, in save     save_handler(self, fp, filename)   File "/usr/lib/python2.6/dist-packages/PIL/BmpImagePlugin.py", line 197, in _save     raise IOError("cannot write mode %s as BMP" % im.mode) IOError: cannot write mode RGBA as BMP >>> r,g,b,a = image.split() >>> img = Image.merge("RGB", (r,g,b)) >>> print image_to_string(img) Tesseract Open Source OCR Engine       éi     _   l_` _ t     ’   ‘" fY`     {  W       IKQW   ·  __·_  ‘ ·-»·              :W   Z     ··  I  A n   1               ;f              `    `       `T     .' V   _ ‘   I  {Z.; » ;,. , ;  y i-   4 : %:,,           `· »    V; ` ?     ‘,—·.     H***li¥v·•·}I§¢   ` _  »¢is5#__·¤G$++}§;“»‘7·   71   ’    Q  {  NH IQ   ytéggygi {     ;g¤qg;gm·;,g(g,,3) {3;;+-    § {Jf**$d$ }‘$p•¢L#d¤ Sc}   »   i `  i A1: 

That's clearly gibberish that comes out of the OCR engine. So, my question is:

  • What do I have to do to get better OCR results out of Tesseract?
  • Or, does anybody else have better luck extracting the code from the above image in another way?
like image 413
BioGeek Avatar asked Mar 10 '10 16:03

BioGeek


2 Answers

pre-processing will definitely yield a more workable image.

For example, here is the result of Gimp "Levels", "Difference-of-Gaussians", and "Levels" filters on the image.

pre processed image

like image 38
Joe Koberg Avatar answered Sep 22 '22 19:09

Joe Koberg


You can probably type faster than you can clean up images and install OCR engines:

#!/usr/bin/perl (my$d=q[AA                GTCAGTTCCT   CGCTATGTA                 ACACACACCA     TTTGTGAGT                ATGTAACATA       CTCGCTGGC              TATGTCAGAC         AGATTGATC          GATCGATAGA           ATGATAGATC     GAACGAGTGA             TAGATAGAGT GATAGATAGA               GAGAGA GATAGAACGA                 TC GATAGAGAGA                  TAGATAGACA G                ATCGAGAGAC AGATA              GAACGACAGA TAGATAGAT            TGAGTGATAG    ACTGAGAGAT          AGATAGATTG        ATAGATAGAT        AGATAGATAG           ACTGATAGAT      AGAGTGATAG             ATAGAATGAG    AGATAGACAG               ACAGACAGAT   AGATAGACAG               AGAGACAGAT   TGATAGATAG             ATAGATAGAT   TGATAGATAG           AATGATAGAT    AGATTGAGTG        ACAGATCGAT      AGAACCTTTCT   CAGTAACAGT        CTTTCTCGC TGGCTTGCTT          TCTAA CAACCTTACT            G ACTGCCTTTC            TGAGATAGAT CGA          TAGATAGATA GACAGAC        AGATAGATAG  ATAGAATGAC      AGACAGAGAG      ACAGAATGAT    CGAGAGACAG          ATAGATAGAT   AGAATGATAG             ACAGATAGAC   AGATAGATAG               ACAGACAGAT   AGACAGACTG                 ATAGATAGAT    AGATAGATAG                 AATGACAGAT      CGATTGAATG               ACAGATAGAT        CGACAGATAG             ATAGACAGAT          AGAGTGATAG          ATTGATCGAC            TGATTGATAG      ACTGATTGAT              AGACAGATAG  AGTGACAGAT                CGACAGA TAGATAGATA                  GATA GATAGATAG                     ATAGACAGA G                   AGATAGATAG ACA                 GTCGCAAGTTC GCTCACA ])=~s/\s+//g;%a=map{chr $_=>$i++}65,84,67, 71;$p=join$;,keys%a;while($d=~/([$p]{4})/g ){next if$j++%96>=16;$c=0;for$d(0..3){$c+= $a{substr($1,$d,1)}*(4**$d)}$perl.=chr $c}              eval $perl; 

Edit: typo.

like image 135
ЯegDwight Avatar answered Sep 23 '22 19:09

ЯegDwight