Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hash/digest with determined (small) amount of variants

I need to get only 16 (or other small number) of possible hashes from string for color-coding contacts based on contact name.

I had try to get a crc32 hash and then take a first symbol, which is hex digit:

$contact = 'Robin Hood';
$colors = [
     '0' => 'F8BBD0',
     '1' => 'E1BEE7',
     ...
     'e' => 'D7CCC8',
     'f' => 'CFD8DC',
];
$firstLetter = hash('crc32', $contact)[0];
return '#' . $colors[$firstLetter];

However, I have doubts about good distribution of such method. How can I get small and determined amount of variants digest from string?

like image 253
vatavale Avatar asked Mar 31 '16 08:03

vatavale


1 Answers

If your primary concern is good distribution I would use a good pseudo random engine:

$colorKey = bin2hex(openssl_random_pseudo_bytes(1))[0];
return '#' . $colors[$colorKey];

I can't say how this compares to the distribution of the first char of a hash, but this will definitely work well for your purposes.


Edit: Since you require a hash, I tested various hashes against a text file of 500 random names and found crc32 to have the most even distribution. I don't know how many names you expect or how good a distribution you require, but your solution seems like a good choice to me:

<?php
function sd_square($x, $mean) { return pow($x - $mean,2); }

function sd($array) {
    return sqrt(array_sum(array_map("sd_square", $array, array_fill(0,count($array), (array_sum($array) / count($array)) ) ) ) / (count($array)-1) );
}

$crcColorCounts = array_fill_keys(range(0, 15), 0);
$file = fopen('random_names.txt', 'r');
while ($contact = fgets($file)) {
    $hash = hash('crc32', $contact);
    $letter = $hash[0];
    $crcColorCounts[hexdec($letter)]++;
}
fclose($file);
print_r($crcColorCounts);
echo 'Standard deviation: ', sd($crcColorCounts);

Output:

Array
(
    [0] => 32
    [1] => 26
    [2] => 33
    [3] => 31
    [4] => 40
    [5] => 29
    [6] => 33
    [7] => 20
    [8] => 33
    [9] => 30
    [10] => 39
    [11] => 27
    [12] => 36
    [13] => 33
    [14] => 29
    [15] => 30
)
Standard deviation: 4.8815127436755

(Standard deviation function taken from this answer.)

I also tried MD5, SHA1, and the last char of CRC32. All returned standard deviations around 7, making the first char of CRC32 the winner.

like image 67
Matt S Avatar answered Nov 20 '22 10:11

Matt S