Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP: Determine Visually Corrupted Images (yet valid) downloaded via Curl with GD/Imagemagick

I'm using Curl via Proxies to download images with a scraper I have developed.

Unfortunately, it gets the odd image which looks like these and the last one is completely blank :/

3/4 corrupteddog corruptedroom corruptedcompletely white

  • When I test the images via imagemagick (using identify) it tells me they are valid images.
  • When I test the images via exif_imagetype() and imagecreatefromjpeg() again, both these functions tell me the images are valid.

Does anyone have a way to determine if the image has majority of greyness or is completely blank/white and these are indeed corrupted images?

I have done a lot of checking with other questions on here, but I haven't had much luck with other solutions. So please take care in suggesting this is a duplicate.

Thanks


After knowing about imgcolorat, I did a search and stumbled on some code. I came up with this:

<?php

$file = dirname(__FILE__) . "/images/1.jpg";

$img = imagecreatefromjpeg($file);

$imagew = imagesx($img);
$imageh = imagesy($img);
$xy = array();

$last_height = $imageh - 5;

$foo = array();

$x = 0;
$y = 0;
for ($x = 0; $x <= $imagew; $x++) 
{
    for ($y = $last_height;$y <= $imageh; $y++ ) 
    {
        $rgb = @imagecolorat($img, $x, $y);

        $r = ($rgb >> 16) & 0xFF;
        $g = ($rgb >> 8) & 0xFF;
        $b = $rgb & 0xFF;

        if ($r != 0)
        {
            $foo[] = $r;
        }
    }
}

$bar = array_count_values($foo);

$gray = (isset($bar['127']) ? $bar['127'] : 0) + (isset($bar['128']) ? $bar['128'] : 0) + (isset($bar['129']) ? $bar['129'] : 0);
$total = count($foo);
$other = $total - $gray;

if ($gray > $other)
{
    echo "image corrupted \n";
}
else
{
    echo "image not corrupted \n";
}
?>

Anyone see some potential pitfalls with this? I thought about getting the last few rows of the image and then comparing the total of r 127,128,129 (which are gray) against the total of other colours. If gray is greater than the other colours then the image is surely corrupted.

Opinions welcome! :)

like image 289
PaulM Avatar asked Jan 24 '12 22:01

PaulM


1 Answers

found this page when looking for a way to check visually corrupted images like this. Here is a way to solve the problem using bash (anyway, the convert command line can be easily adapted for php or python) :

convert INPUTFILEPATH -gravity SouthWest -crop 20%x1%   -format %c  -depth 8  histogram:info:- | sed '/^$/d'  | sort -V | head -n 1 | grep fractal | wc -l

It crops a little square in the southwest corner of the picture, then gets the histogram of this picture. If the main color of the histogram has the name "fractal" instead of an rgb color, it means this zone is corrupted and so the output will be 1 and 0 otherwise.

Hope this helps!

like image 105
TheFargue Avatar answered Sep 20 '22 17:09

TheFargue