Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A quick string checksum function in Perl generating values in the 0..2^32-1 range

I'm looking for a Perl string checksum function with the following properties:

  • Input: Unicode string of undefined length ($string)
  • Output: Unsigned integer ($hash), for which 0 <= $hash <= 2^32-1 holds (0 to 4294967295, matching the size of a 4-byte MySQL unsigned int)

Pseudo-code:

sub checksum {
    my $string = shift;
    my $hash;
    ... checksum logic goes here ...
    die unless ($hash >= 0);
    die unless ($hash <= 4_294_967_295);
    return $hash;
}

Ideally the checksum function should be quick to run and should generate values somewhat uniformly in the target space (0 .. 2^32-1) to avoid collisions. In this application random collisions are totally non-fatal, but obviously I want to avoid them to the extent that it is possible.

Given these requirements, what is the best way to solve this?

like image 435
knorv Avatar asked Dec 22 '09 12:12

knorv


3 Answers

From perldoc -f unpack:

        For example, the following computes the same number as the
        System V sum program:

            $checksum = do {
                local $/;  # slurp!
                unpack("%32W*",<>) % 65535;
            };
like image 190
Randal Schwartz Avatar answered Nov 13 '22 17:11

Randal Schwartz


Any hash function will be sufficient - simply truncate it to 4-bytes and convert to a number. Good hash functions have a random distribution, and this distribution will be constant no matter where you truncate the string.

I suggest Digest::MD5 because it is the fastest hash implementation that comes with Perl as standard. String::CRC, as Pim mentions, is also implemented in C and should be faster.

Here's how to calculate the hash and convert it to an integer:

use Digest::MD5 qw(md5);
my $str = substr( md5("String-to-hash"), 0, 4 );
print unpack('L', $str);  # Convert to 4-byte integer (long)
like image 24
rjh Avatar answered Nov 13 '22 18:11

rjh


Don't know how quick it is, but you might try String::CRC.

like image 4
Pim Avatar answered Nov 13 '22 19:11

Pim