Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Very fast hash function for hashing 8-16 byte strings

I need a very fast string hashing function, that fits well with web application written in PHP.

The problem I am trying to overcome is assigning IDs to permissions in an access control system. I am thinking about using hashed strings to represent IDs of permissions. This way I will be able to check permissions the way like this:

if ($Auth->isAllowed($user, "blog.comment")) {
    // Do some operation
}
...

if ($Auth->isAllowed($user, "profile.avatar.change")) {
    // Do some other operation
}

The DB table will map permission hashes to user's roles. To check that the user is allowed to do "profile.avatar.change" the corresponding string will be hashed and checked against DB table.

This is very handy and there will be no need to worry about maintaining unique permission IDs among different modules. But the hashing function should be very efficient.

like image 768
ezpresso Avatar asked Feb 06 '17 17:02

ezpresso


People also ask

How fast is Python hash?

Last but not least, calculating a hash value in Python is fast, even for very big inputs. On a modern computer, calling hash() with a string of 100 million characters as the argument returns instantaneously.

How do you get a good hash function?

A good hash function to use with integer key values is the mid-square method. The mid-square method squares the key value, and then takes out the middle r bits of the result, giving a value in the range 0 to 2r−1. This works well because most or all bits of the key value contribute to the result.

How do you make short hash?

You could use an existing hash algorithm that produces something short, like MD5 (128 bits) or SHA1 (160). Then you can shorten that further by XORing sections of the digest with other sections. This will increase the chance of collisions, but not as bad as simply truncating the digest.

Can you hash a string?

Hashing is the process of transforming any given key or a string of characters into another value. This is usually represented by a shorter, fixed-length value or key that represents and makes it easier to find or employ the original string. The most popular use for hashing is the implementation of hash tables.


2 Answers

The first though was why don't he use a simple md5 function?.

Trying to write hash by myself

One of the most frequently referred function is a simple hash Bernstein's function also reffered to as Times 33 with Addition. It is used in php by zend to make hashes for keys of associative array. In php it could be implemented as follows:

function djb2($s){
    $word = str_split($s);
    $length = count($word);

    $hashAddress = 5381;
    for ($counter = 0; $counter < $length; $counter++){
        $hashAddress = (($hashAddress << 5) + $hashAddress) + $word[$counter];
    }
    return $hashAddress;
}
echo djb2("stackoverflow");

The problem is that when it is implemented this way, it is rather slow. Tests shows that it is ~3 times slower, than md5. So we have to find the fastest internal implementation of a hash function.

Finding the best internal hash

Just take all algos and measure time to hash a million of strings.

function testing($algo, $str) {
    $start = microtime(true);
    for($ax = 0; $ax < 1000000; $ax++){
        hash($algo, $str);
    }

    $end = microtime(true);
    return ($end - $start);
}


$algos = hash_algos();
$times = [];

foreach($algos as $algo){
    $times[$algo] = testing($algo, "stackoverflow");
}

// sort by time ASC
asort($times);

foreach($times as $algo => $time){
    echo "$algo -> " . round($time, 2)."sec\n";
}

My results was:

fnv1a32 -> 0.29sec
fnv132 -> 0.3sec
crc32b -> 0.3sec
adler32 -> 0.3sec
crc32 -> 0.31sec
joaat -> 0.31sec
fnv1a64 -> 0.31sec
fnv164 -> 0.31sec
md4 -> 0.46sec
md5 -> 0.54sec
...
md2 -> 6.32sec

The result slightly changes from execution to execution - the first 8 algos are shuffling due to their close speeds and its dependency on the server load.

What should be chosen?

You can take any of top-8 functions above: $hash = hash('crc32', $string);. Actually a widely used md5 function is just 1.7 times slower than the leaders.

Bonus

There are another functions like SuperFastHash, that are not implemented in php code, but they are 4x faster than crc32.

like image 56
shukshin.ivan Avatar answered Nov 11 '22 19:11

shukshin.ivan


The processing time of a hashing function can be considered negligible in most cases. If you need a little hash (8 characters), you can simply use the crc32 function.

<?php
$hash = hash('crc32', 'WhatDoYouWant');
?>

You can also combine hash with uniqid to create random hash.

<?php
$hash = hash('crc32', uniqid());
?>
like image 40
arnolem Avatar answered Nov 11 '22 17:11

arnolem