Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Simple string hashing function

I'm trying to hash a string into an integer for placing it in an array. However I do not know all too much about hashing functions, and that's why my current method is just adding all the ASCII numbers of the characters together and taking it mod the array size.

Are there any simple faster/better methods?

like image 855
Hal Avatar asked Sep 11 '10 10:09

Hal


2 Answers

I've tried many fast hash functions and chosen this one:

function StrHash(const st:string):cardinal; 
 var
  i:integer;
 begin
  result:=0;
  for i:=1 to length(st) do
   result:=result*$20844 xor byte(st[i]);
 end;

It is as fast as K&R function (actually even faster) but makes better (more even) distribution.

like image 152
Ivan Polyacov Avatar answered Sep 28 '22 05:09

Ivan Polyacov


As Dummy00001 pointed out, this has been asked and answered before. Take a look at Best algorithm for hashing number values?, particularly the suggestion of using MurmurHash.

I'd recommend MurmurHash because:

  1. It's very fast.

  2. Its distribution and avalanche characteristics are excellent for a non-cryptographic hash.

  3. Its worst-case behavior is still pretty good.

I've used it. It doesn't suck.

edit

There was a lot of discussion about how to best port it to Delphi, on https://forums.embarcadero.com/thread.jspa?threadID=13902&tstart=0. The resulting code is available at https://forums.codegear.com/thread.jspa?threadID=14879

Delphi translation

function Murmur2(const S: AnsiString; const Seed: LongWord=$9747b28c): LongWord;
var
    h: LongWord;
    len: LongWord;
    k: LongWord;
    data: Integer;
const
    // 'm' and 'r' are mixing constants generated offline.
    // They're not really 'magic', they just happen to work well.
    m = $5bd1e995;
    r = 24;
begin
    len := Length(S);

    //The default seed, $9747b28c, is from the original C library

    // Initialize the hash to a 'random' value
    h := seed xor len;

    // Mix 4 bytes at a time into the hash
    data := 1;

    while(len >= 4) do
    begin
        k := PLongWord(@S[data])^;

        k := k*m;
        k := k xor (k shr r);
        k := k* m;

        h := h*m;
        h := h xor k;

        data := data+4;
        len := len-4;
    end;

    {   Handle the last few bytes of the input array
            S: ... $69 $18 $2f
    }
    Assert(len <= 3);
    if len = 3 then
        h := h xor (LongWord(s[data+2]) shl 16);
    if len >= 2 then
        h := h xor (LongWord(s[data+1]) shl 8);
    if len >= 1 then
    begin
        h := h xor (LongWord(s[data]));
        h := h * m;
    end;

    // Do a few final mixes of the hash to ensure the last few
    // bytes are well-incorporated.
    h := h xor (h shr 13);
    h := h * m;
    h := h xor (h shr 15);

    Result := h;
end;

Passes all self-tests from the original C implementation.

like image 22
Steven Sudit Avatar answered Sep 28 '22 05:09

Steven Sudit