Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sorting 10,000 images by color

I have 10,000 images I want to sort by color to make in to a print.

I'm getting pretty far. I've averaged their color so now I have two directories: one with all the original images (original_images/), and one with equally named jpegs of their average color (averages/).

Next, I use PHP to sort the average images:

// $images is an array with all the filenames.
$sorted_images = array();
$loop_limit = count($images);
for($i = 0; $i < $loop_limit; $i++) {
    $image = imagecreatefromjpeg("averages/" . $images[$i]);
    $rgb = imagecolorat($image, 50, 50);
    imagedestroy($image);
    $r = ($rgb >> 16) & 0xFF;
    $g = ($rgb >> 8) & 0xFF;
    $b = $rgb & 0xFF;
    $hsv = rgb_to_hsv($r, $g, $b); // function to convert rgb to Hue/Sat/Value
    $h = (string) $hsv['H'];
    if(isset($sorted_h[$h])) {
        $duplicates++;
        echo("oh no! " . $h . " is a dupe! found " . $duplicates . " duplicates so far.<br>");
    }
    $sorted_h[$h] = $images[$i];
}

// sort the array by key:
ksort($sorted_images, SORT_NUMERIC);

edit the problem is that the keys $h range from (apparently) -0.1666666667 to somewhere around 1. My gut says that chances are really small that there are duplicate values, but in fact there turn out to be over 6000 duplicate keys. I tried casting the $h value to a string because I thought maybe the array keys are rounded?

That didn't work though. This is the function to convert rgb to HSV. I found it somewhere without any documentation...

function RGB_TO_HSV ($R, $G, $B) { 
    $HSV = array();

    $var_R = ($R / 255);
    $var_G = ($G / 255);
    $var_B = ($B / 255);

    $var_Min = min($var_R, $var_G, $var_B);
    $var_Max = max($var_R, $var_G, $var_B);
    $del_Max = $var_Max - $var_Min;

    $V = $var_Max;

    if ($del_Max == 0)
    {
        $H = 0;
        $S = 0;
    }
    else
    {
        $S = $del_Max / $var_Max;

        $del_R = ( ( ( $max - $var_R ) / 6 ) + ( $del_Max / 2 ) ) / $del_Max;
        $del_G = ( ( ( $max - $var_G ) / 6 ) + ( $del_Max / 2 ) ) / $del_Max;
        $del_B = ( ( ( $max - $var_B ) / 6 ) + ( $del_Max / 2 ) ) / $del_Max;

        if ($var_R == $var_Max) $H = $del_B - $del_G;
        else if ($var_G == $var_Max) $H = ( 1 / 3 ) + $del_R - $del_B;
        else if ($var_B == $var_Max) $H = ( 2 / 3 ) + $del_G - $del_R;

        if (H<0) $H++;
        if (H>1) $H--;
    }

    $HSV['H'] = $H;
    $HSV['S'] = $S;
    $HSV['V'] = $V;

    return $HSV;
}

So the questions now are:

  1. Is the rgb_to_hsv()-function correct?
  2. How can I make sure that keys aren't overwritten in the array, but the values are (closely) maintained? For instance; if two images have a $h-value of 0.01111111111, when the second one is pushed to the array, it's key should be 0.01111111112?

(old edits:) edit: I've changed rename() to copy() so that I don't have to reupload 10,000 images every time it goes wrong ;-). I've also used ini_set("max_execution_time", 300); to bump the max exec time from 60 to 300, added imagedestroy($image) to decrease memory usage and improved to for-loop by changing $i < count($images) to $loop_limit = count($images).

edit 2: Okay so I've found a problem. The $h (Hue) value for the images is the same every now and then. So using sorted_images[$h] = $images[$i] overwrites the value for that key in the array. In fact; there turn out to be over 6000 duplicate values... How would I go about and fix that, without messing with the $h-value too much?

like image 703
Rein Avatar asked Nov 03 '22 14:11

Rein


1 Answers

Have you tried enabling error messages?

error_reporting(E_ALL);
ini_set('display_errors', 1);

As for the local vs master values. 'local' means that the script that is currently ran is using a timeout of 300 seconds. 'master' applies to all other requests (unless explicitly modified)

Cron would be a way to go, but I don't think this should be executed multiple times every X seconds/minutes/hours? You can simply use the command line yourself to do this. look here for more information: http://www.php.net/manual/en/features.commandline.usage.php

Seeing as the script works it's most likely one of the following issues:

memory_limit not high enough. Should give a PHP error with errors enabled. execution time not high enough. Should give a PHP error with errors enabled.

use the init_set methods to increase both, if you 'just' want the script to run, set timeout to 0 seconds and memory limit as high as you can go. If you want to actually learn what is the exact cause, you might think about looking up 'xdebug' to see if there are any memory leaks or which commands take the longest time to execute. Looking at the code, I'll assume it's the copy command taking a while to execute (more then 1ms, which is a lot after 10000 iterations)

If modifying these values can not be done, or you simply want to toy around with working with high-memory, long execution time scripts with limited resources, try to rewrite the script to execute the renaming in batches and set a cron to execute the script every X minutes (just remove the cron when all images are done)

Good luck :)

like image 82
Tjirp Avatar answered Nov 08 '22 04:11

Tjirp