Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to compare two strings by their "hash" numbers?

I have a string which is lost forever. The only thing I have about it is some magic hash number. Now I have a new string, which could be similar or equal to the lost one. I need to find out how close it is.

Integer savedHash = 352736;
String newText = "this is new string";
if (Math.abs(hash(newText) - savedHash) < 100) {
  // wow, they are very close!
}

Are there any algorithms for this purpose?

ps. The length of the text is not fixed.

pps. I know how usual hash codes work. I'm interested in an algorithm that will work differently, giving me the functionality explained above.

ppps. In a very simple scenario this hash() method would look like:

public int hash(String txt) {
  return txt.length();
}
like image 727
yegor256 Avatar asked Dec 12 '22 14:12

yegor256


1 Answers

Standard hashing will not work in this case since close hash values do not imply close strings. In fact, most hash functions are designed to give close strings very different values, so as to create a random distribution of hash values for any given set of input strings.

If you had access to both strings, then you could use some kind of string distance function, such as Levenshtein distance. This calculates the edit distance between two strings, or the number of edits required to transform one to the other.

In this case however, the best approach might be to use some kind of fuzzy hashing technique. That way you don't have to store the original string, and can still get some measure of similarity.

like image 146
Avi Avatar answered Dec 21 '22 23:12

Avi