Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read text in image with PHP

I'm trying to read the text from this image:

image

I want to read the price, e.g. "EUR42721.92"

I tried these libraries:

  1. How to Create a PHP Captcha Decoder with PHP OCR Class: Recognize text & objects in graphical images - PHP Classes
  2. phpOCR: Optical Character Recognizer written in PHP

But they don't work. How can I read the text?

like image 895
Bora Avatar asked Nov 30 '12 10:11

Bora


1 Answers

Try this (it worked with me):

$imagick = new Imagick($filePath);

$size = $imagick->getImageGeometry();
$width     = $size['width'];
$height    = $size['height'];
unset($size);

$textBottomPosition = $height-1;
$textRightPosition = $width;

$black = new ImagickPixel('#000000');
$gray  = new ImagickPixel('#C0C0C0');

$textRight  = 0;
$textLeft   = 0;
$textBottom = 0;
$textTop    = $height;

$foundGray = false;

for($x= 0; $x < $width; ++$x) {
    for($y = 0; $y < $height; ++$y) {
        $pixel = $imagick->getImagePixelColor($x, $y);
        $color = $pixel->getColor();
        // remove alpha component
        $pixel->setColor('rgb(' . $color['r'] . ','
                         . $color['g'] . ','
                         . $color['b'] . ')');

        // find the first gray pixel and ignore pixels below the gray
        if( $pixel->isSimilar($gray, .25) ) {
            $foundGray = true;
            break;
        }

        // find the text boundaries 
        if( $foundGray && $pixel->isSimilar($black, .25) ) {
            if( $textLeft === 0 ) {
                $textLeft = $x;
            } else {
                $textRight = $x;
            }

            if( $y < $textTop ) {
                $textTop = $y;
            }

            if( $y > $textBottom ) {
                $textBottom = $y;
            }
        }
    }
}

$textWidth = $textRight - $textLeft;
$textHeight = $textBottom - $textTop;
$imagick->cropImage($textWidth+10, $textHeight+10, $textLeft-5, $textTop-5);
$imagick->scaleImage($textWidth*10, $textHeight*10, true);

$textFilePath = tempnam('/temp', 'text-ocr-') . '.png';
$imagick->writeImage($textFilePath);

$text = str_replace(' ', '', shell_exec('gocr ' . escapeshellarg($textFilePath)));
unlink($textFilePath);
var_dump($text);

You need ImageMagick extension and GOCR installed to run it. If you can't or don't want to install the ImageMagick extension, I'll send you a GD version with a function to calculate colors distances (it's just an extended Pythagorean Theorem).

Don't forget to set the $filePath value.

image parsing for cropping visualization

The image shows that it looks for a gray pixel to change the $foundGray flag. After that, it looks for the first and last pixels from the left and from the top. It crops the image with some padding, the resulting image is resized and it's saved to a temporary file. After that, it's easy to use gocr (or any other OCR command or library). The temporary file can be removed after that.

like image 130
Pedro Amaral Couto Avatar answered Oct 07 '22 21:10

Pedro Amaral Couto