I'm trying to read the text from this image:
I want to read the price, e.g. "EUR42721.92
"
I tried these libraries:
But they don't work. How can I read the text?
Try this (it worked with me):
$imagick = new Imagick($filePath);
$size = $imagick->getImageGeometry();
$width = $size['width'];
$height = $size['height'];
unset($size);
$textBottomPosition = $height-1;
$textRightPosition = $width;
$black = new ImagickPixel('#000000');
$gray = new ImagickPixel('#C0C0C0');
$textRight = 0;
$textLeft = 0;
$textBottom = 0;
$textTop = $height;
$foundGray = false;
for($x= 0; $x < $width; ++$x) {
for($y = 0; $y < $height; ++$y) {
$pixel = $imagick->getImagePixelColor($x, $y);
$color = $pixel->getColor();
// remove alpha component
$pixel->setColor('rgb(' . $color['r'] . ','
. $color['g'] . ','
. $color['b'] . ')');
// find the first gray pixel and ignore pixels below the gray
if( $pixel->isSimilar($gray, .25) ) {
$foundGray = true;
break;
}
// find the text boundaries
if( $foundGray && $pixel->isSimilar($black, .25) ) {
if( $textLeft === 0 ) {
$textLeft = $x;
} else {
$textRight = $x;
}
if( $y < $textTop ) {
$textTop = $y;
}
if( $y > $textBottom ) {
$textBottom = $y;
}
}
}
}
$textWidth = $textRight - $textLeft;
$textHeight = $textBottom - $textTop;
$imagick->cropImage($textWidth+10, $textHeight+10, $textLeft-5, $textTop-5);
$imagick->scaleImage($textWidth*10, $textHeight*10, true);
$textFilePath = tempnam('/temp', 'text-ocr-') . '.png';
$imagick->writeImage($textFilePath);
$text = str_replace(' ', '', shell_exec('gocr ' . escapeshellarg($textFilePath)));
unlink($textFilePath);
var_dump($text);
You need ImageMagick extension and GOCR installed to run it. If you can't or don't want to install the ImageMagick extension, I'll send you a GD version with a function to calculate colors distances (it's just an extended Pythagorean Theorem).
Don't forget to set the $filePath value.
The image shows that it looks for a gray pixel to change the $foundGray flag. After that, it looks for the first and last pixels from the left and from the top. It crops the image with some padding, the resulting image is resized and it's saved to a temporary file. After that, it's easy to use gocr (or any other OCR command or library). The temporary file can be removed after that.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With