Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to compute "m" in Jaro Winkler distance?

For computing Jaro distance of two string we use this equation:

dj = 1/3 (m/|s1| + m/|s2| + (m-t)/m)

How should I compute "m" for two strings in this equation?

If "m" is the difference between two strings, why does the example at Wikipedia for the two string "MARTHA" and "MARHTA", m is 6. I think it should be 1, because the difference between the strings is 1 not 6! Am I right?

like image 261
soodeh p Avatar asked Sep 03 '13 14:09

soodeh p


1 Answers

m is the number of characters that are shared between the two strings regardless of their positions and that their distance is not farther than d = floor(max(len(String1), len(String2)) / 2) - 1 (thanks Michael Foukarakis). This is 6 for MARTHA and MARHTA.

t is the number of characters that are shared but are in different positions, divided by 2. In this case , 2 characters (H and T) are shared but are in different positions so t = 2/2 = 1.

like image 152
Bitwise Avatar answered Sep 17 '22 06:09

Bitwise