For computing Jaro distance of two string we use this equation:
dj = 1/3 (m/|s1| + m/|s2| + (m-t)/m)
How should I compute "m" for two strings in this equation?
If "m" is the difference between two strings, why does the example at Wikipedia for the two string "MARTHA" and "MARHTA", m is 6. I think it should be 1, because the difference between the strings is 1 not 6! Am I right?
m
is the number of characters that are shared between the two strings regardless of their positions and that their distance is not farther than d = floor(max(len(String1), len(String2)) / 2) - 1
(thanks Michael Foukarakis). This is 6 for MARTHA
and MARHTA
.
t
is the number of characters that are shared but are in different positions, divided by 2. In this case , 2 characters (H
and T
) are shared but are in different positions so t = 2/2 = 1
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With