Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hash function for string with complexity O(N)

For example we have a string: "abc". Is it possible to create a hash function (with complexity O(N), where N is string length) that will do the following: for all permutation for string "abc" it will return the same result.

For example:

hash("abc") returns SC0wA //just an example value, not a real hash key
hash("bac") returns SC0wA
...
hash("cba") returns SC0wA

But for "bba" it will be:

hash("bba") return GD1z
hash("bab") return GD1z

UPD:

Hash function should not have any collisions for the whole alhpabet

like image 582
No Name QA Avatar asked Jun 23 '16 09:06

No Name QA


2 Answers

One simple algorithm could be:

int x = 0;
int s = 0;
for each character c in the string str
{
 x = x ^ c
 s = s + ASCII value of c
}

hash(str) = x + s

Collisions Handling

The reason I added the value s in the final answer is because suppose we have two strings s1 = "ab" and s2 = "ef", they would result in a collision just by xor operation,however after we add the value of sum of their ASCII values, they do not result in collision.

The xor operation also helps to avoid collision when sum of ASCII values of the characters is same. Suppose we have s1 = "ad" and s2 = "bc". By only considering the sums of ASCII values, it would result in collision but after the considering the xor operation as well it does not.

Also for strings of even length like "aaaa" and "bbbb" , if we only consider the xor operations, still we have collision but by adding the sums of ASCII values, we can avoid the collision.

So combining the sum of ASCII values of characters of the string and the xor operation, collision can be handled to a larger extent.

like image 161
Sumeet Avatar answered Oct 18 '22 08:10

Sumeet


Well, you could do it like this, in C# :

string text = "abc";
int hash = 0;

foreach(char c in text)
{
  int value = (int)c;
  hash += value;
}

The distribution of the hash value won't be great, but it does the job.

UPDATE: Since you've mentioned that the alphabet just consists of A-Z and a-z then another option is to map their position in the alphabet to bits in a long, with the uppercase characters taking up the first 26 bits and the lower case characters taking up the next 26 bits:

long MakeHash(string text)
{
    long hash = 0;
    long numberOfCharacters = 0;

    foreach(var c in text)
    {
        int offset = 0;

        if(c >= 'A' && c <='Z')
        {
            offset = c - 'A';
        }
        else
        {
            offset = (c - 'a') + 26;
        }

        hash |= (1L << offset);

        numberOfCharacters++;
    }

    hash |= (numberOfCharacters << 52);

    return hash;
}

Note how at the end the number of characters is OR'd into the bits 52 and beyond. Without this strings like aa and aaa would map to the same value as they'd all just set the a bit. With the length combined into the value you get a different value.

like image 23
Sean Avatar answered Oct 18 '22 09:10

Sean