I need a very fast string hashing function, that fits well with web application written in PHP.
The problem I am trying to overcome is assigning IDs to permissions in an access control system. I am thinking about using hashed strings to represent IDs of permissions. This way I will be able to check permissions the way like this:
if ($Auth->isAllowed($user, "blog.comment")) {
// Do some operation
}
...
if ($Auth->isAllowed($user, "profile.avatar.change")) {
// Do some other operation
}
The DB table will map permission hashes to user's roles. To check that the user is allowed to do "profile.avatar.change" the corresponding string will be hashed and checked against DB table.
This is very handy and there will be no need to worry about maintaining unique permission IDs among different modules. But the hashing function should be very efficient.
Last but not least, calculating a hash value in Python is fast, even for very big inputs. On a modern computer, calling hash() with a string of 100 million characters as the argument returns instantaneously.
A good hash function to use with integer key values is the mid-square method. The mid-square method squares the key value, and then takes out the middle r bits of the result, giving a value in the range 0 to 2r−1. This works well because most or all bits of the key value contribute to the result.
You could use an existing hash algorithm that produces something short, like MD5 (128 bits) or SHA1 (160). Then you can shorten that further by XORing sections of the digest with other sections. This will increase the chance of collisions, but not as bad as simply truncating the digest.
Hashing is the process of transforming any given key or a string of characters into another value. This is usually represented by a shorter, fixed-length value or key that represents and makes it easier to find or employ the original string. The most popular use for hashing is the implementation of hash tables.
The first though was why don't he use a simple md5
function?.
Trying to write hash by myself
One of the most frequently referred function is a simple hash Bernstein's function also reffered to as Times 33 with Addition
. It is used in php
by zend to make hashes for keys of associative array. In php
it could be implemented as follows:
function djb2($s){
$word = str_split($s);
$length = count($word);
$hashAddress = 5381;
for ($counter = 0; $counter < $length; $counter++){
$hashAddress = (($hashAddress << 5) + $hashAddress) + $word[$counter];
}
return $hashAddress;
}
echo djb2("stackoverflow");
The problem is that when it is implemented this way, it is rather slow. Tests shows that it is ~3 times slower, than md5
. So we have to find the fastest internal implementation of a hash
function.
Finding the best internal hash
Just take all algos and measure time to hash a million of strings.
function testing($algo, $str) {
$start = microtime(true);
for($ax = 0; $ax < 1000000; $ax++){
hash($algo, $str);
}
$end = microtime(true);
return ($end - $start);
}
$algos = hash_algos();
$times = [];
foreach($algos as $algo){
$times[$algo] = testing($algo, "stackoverflow");
}
// sort by time ASC
asort($times);
foreach($times as $algo => $time){
echo "$algo -> " . round($time, 2)."sec\n";
}
My results was:
fnv1a32 -> 0.29sec
fnv132 -> 0.3sec
crc32b -> 0.3sec
adler32 -> 0.3sec
crc32 -> 0.31sec
joaat -> 0.31sec
fnv1a64 -> 0.31sec
fnv164 -> 0.31sec
md4 -> 0.46sec
md5 -> 0.54sec
...
md2 -> 6.32sec
The result slightly changes from execution to execution - the first 8 algos are shuffling due to their close speeds and its dependency on the server load.
What should be chosen?
You can take any of top-8 functions above: $hash = hash('crc32', $string);
. Actually a widely used md5
function is just 1.7 times slower than the leaders.
Bonus
There are another functions like SuperFastHash, that are not implemented in php
code, but they are 4x faster than crc32
.
The processing time of a hashing function can be considered negligible in most cases. If you need a little hash (8 characters), you can simply use the crc32 function.
<?php
$hash = hash('crc32', 'WhatDoYouWant');
?>
You can also combine hash with uniqid to create random hash.
<?php
$hash = hash('crc32', uniqid());
?>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With