Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Human name comparison: ways to approach this task

I'm not a Natural Language Programming student, yet I know it's not trivial strcmp(n1,n2).

Here's what i've learned so far:

  • comparing Personal Names can't be solved 100%
  • there are ways to achieve certain degree of accuracy.
  • the answer will be locale-specific, that's OK.

I'm not looking for spelling alternatives! The assumption is that the input's spelling is correct.

For example, all the names below can refer to the same person:

  • Berry Tsakala
  • Bernard Tsakala
  • Berry J. Tsakala
  • Tsakala, Berry

I'm trying to:

  1. build (or copy) an algorithm which grades the relationship 2 input names
  2. find an indexing method (for names in my database, for hash tables, etc.)

note: My task isn't about finding names in text, but to compare 2 names. e.g.

name_compare( "James Brown", "Brown, James", "en-US" ) ---> 99.0%
like image 416
Berry Tsakala Avatar asked Jan 28 '26 08:01

Berry Tsakala


2 Answers

I used Tanimoto Coefficient for a quick (but not super) solution, in Python:

"""
Formula:
  Na = number of set A elements
  Nb = number of set B elements
  Nc = number of common items

  T = Nc / (Na + Nb - Nc)
"""
def tanimoto(a, b):
    c = [v for v in a if v in b]
    return float(len(c)) / (len(a)+len(b)-len(c))

def name_compare(name1, name2):
    return tanimoto(name1, name2)


>>> name_compare("James Brown", "Brown, James")
0.91666666666666663
>>> name_compare("Berry Tsakala", "Bernard Tsakala")
0.75
>>> 

Edit: A link to a good and useful book.

like image 172
Nick Dandoulakis Avatar answered Jan 31 '26 17:01

Nick Dandoulakis


Soundex is sometimes used to compare similar names. It doesn't deal with first name/last name ordering, but you could probably just have your code look for the comma to solve that problem.

like image 45
Jacob Avatar answered Jan 31 '26 15:01

Jacob



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!