Human name comparison: ways to approach this task

Question

I'm not a Natural Language Programming student, yet I know it's not trivial strcmp(n1,n2).

Here's what i've learned so far:

comparing Personal Names can't be solved 100%
there are ways to achieve certain degree of accuracy.
the answer will be locale-specific, that's OK.

I'm not looking for spelling alternatives! The assumption is that the input's spelling is correct.

For example, all the names below can refer to the same person:

Berry Tsakala
Bernard Tsakala
Berry J. Tsakala
Tsakala, Berry

I'm trying to:

build (or copy) an algorithm which grades the relationship 2 input names
find an indexing method (for names in my database, for hash tables, etc.)

note: My task isn't about finding names in text, but to compare 2 names. e.g.

name_compare( "James Brown", "Brown, James", "en-US" ) ---> 99.0%

Nick Dandoulakis · Accepted Answer

I used Tanimoto Coefficient for a quick (but not super) solution, in Python:

"""
Formula:
  Na = number of set A elements
  Nb = number of set B elements
  Nc = number of common items

  T = Nc / (Na + Nb - Nc)
"""
def tanimoto(a, b):
    c = [v for v in a if v in b]
    return float(len(c)) / (len(a)+len(b)-len(c))

def name_compare(name1, name2):
    return tanimoto(name1, name2)


>>> name_compare("James Brown", "Brown, James")
0.91666666666666663
>>> name_compare("Berry Tsakala", "Bernard Tsakala")
0.75
>>>

Edit: A link to a good and useful book.

Jacob · Answer

Soundex is sometimes used to compare similar names. It doesn't deal with first name/last name ordering, but you could probably just have your code look for the comma to solve that problem.

Human name comparison: ways to approach this task

Tags:

language-agnostic

nlp

Berry Tsakala

2 Answers

Nick Dandoulakis

Jacob

Recent Activity

Donate For Us

Human name comparison: ways to approach this task

Tags:

language-agnostic

nlp

Berry Tsakala

2 Answers

Nick Dandoulakis

Jacob

Related questions

Recent Activity

Donate For Us