For example we have a string: "abc". Is it possible to create a hash function (with complexity O(N), where N is string length) that will do the following: for all permutation for string "abc" it will return the same result.
For example:
hash("abc") returns SC0wA //just an example value, not a real hash key
hash("bac") returns SC0wA
...
hash("cba") returns SC0wA
But for "bba" it will be:
hash("bba") return GD1z
hash("bab") return GD1z
UPD:
Hash function should not have any collisions for the whole alhpabet
One simple algorithm could be:
int x = 0;
int s = 0;
for each character c in the string str
{
x = x ^ c
s = s + ASCII value of c
}
hash(str) = x + s
Collisions Handling
The reason I added the value s
in the final answer is because suppose we have two strings s1 = "ab"
and s2 = "ef"
, they would result in a collision just by xor
operation,however after we add the value of sum of their ASCII values
, they do not result in collision.
The xor
operation also helps to avoid collision when sum of ASCII values
of the characters is same. Suppose we have s1 = "ad"
and s2 = "bc"
. By only considering the sums of ASCII values
, it would result in collision but after the considering the xor
operation as well it does not.
Also for strings of even length like "aaaa" and "bbbb" , if we only consider the xor
operations, still we have collision but by adding the sums of ASCII values
, we can avoid the collision.
So combining the sum of ASCII values of characters of the string and the xor operation, collision can be handled to a larger extent.
Well, you could do it like this, in C# :
string text = "abc";
int hash = 0;
foreach(char c in text)
{
int value = (int)c;
hash += value;
}
The distribution of the hash value won't be great, but it does the job.
UPDATE: Since you've mentioned that the alphabet just consists of A-Z
and a-z
then another option is to map their position in the alphabet to bits in a long
, with the uppercase characters taking up the first 26 bits and the lower case characters taking up the next 26 bits:
long MakeHash(string text)
{
long hash = 0;
long numberOfCharacters = 0;
foreach(var c in text)
{
int offset = 0;
if(c >= 'A' && c <='Z')
{
offset = c - 'A';
}
else
{
offset = (c - 'a') + 26;
}
hash |= (1L << offset);
numberOfCharacters++;
}
hash |= (numberOfCharacters << 52);
return hash;
}
Note how at the end the number of characters is OR'd into the bits 52 and beyond. Without this strings like aa
and aaa
would map to the same value as they'd all just set the a
bit. With the length combined into the value you get a different value.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With