Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does a string hash exist which can ignore the order of chars in this string

Tags:

hash

Does a string hash exist which can ignore the order of chars in this string? Eg."helloword" and "wordhello" can map into the same bucket.

like image 892
JoJo Avatar asked Oct 20 '22 15:10

JoJo


2 Answers

There is a number of different approaches you can take.

  • You can add the values of the characters together. (a + b + c is equal to a + c + b.) Unfortunately, this is the least desirable approach, since strings like "ac" and "bb" will generate the same hash value.

  • To reduce the possibility of hash code collisions, you can XOR the values together. (a ^ b ^ c is equal to a ^ c ^ b.) Unfortunately, this will not give a very broad distribution of random bits, so it will still give a high chance of collisions for different strings.

  • To even further reduce the possibility of hash code collisions, you can multiply the values of the characters together. (a * b * c is equal to a * c * b.)

  • If that's not good enough either, then you can sort all the characters in the string before applying the default string hashing function offered to you by whatever language it is that you are using. (So, both "helloword" ad "wordhello" would become "dehlloorw" before hashing, thus generating the same hash code.) The only disadvantage of this approach is that it is computationally more expensive than the others.

like image 158
Mike Nakis Avatar answered Oct 24 '22 02:10

Mike Nakis


Although the other suggestions of multiplying or adding characters would work, notice that such a hash function is not secure at all.

The reason is that it will introduce a ton of collisions and one of the main properties a hash function has is the low probability of collisions.

For example, a + b + c is the same as c + b + a. However, it is also the same as a + a + d (since the sum of the ascii characters are the same). The same thing applies for multiplying or xor-ing the numbers.

In sum, if you want to achieve a hash function which ignores order, you can but it will introduce a ton of collisions which will potentially make your program faulty and insecure.

like image 32
Preslav Mihaylov Avatar answered Oct 24 '22 02:10

Preslav Mihaylov