Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to intelligently parse last name

Assuming western naming convention of FirstName MiddleName(s) LastName,

What would be the best way to correctly parse out the last name from a full name?

For example:

John Smith --> 'Smith'
John Maxwell Smith --> 'Smith'
John Smith Jr --> 'Smith Jr'
John van Damme --> 'van Damme'
John Smith, IV --> 'Smith, IV'
John Mark Del La Hoya --> 'Del La Hoya'

...and the countless other permutations from this.

like image 523
David542 Avatar asked Dec 02 '22 01:12

David542


2 Answers

Probably the best answer here is not to try. Names are individual and idosyncratic and, even limiting yourself to the Western tradition, you can never be sure that you'll have thought of all the edge cases. A friend of mine legally changed his name to be a single word, and he's had a hell of a time dealing with various institutions whose procedures can't deal with this. You're in a unique position of being the one creating the software that implements a procedure, and so you have an opportunity to design something that isn't going to annoy the crap out of people with unconventional names. Think about why you need to be parsing out the last name to begin with, and see if there's something else you could do.

That being said, as a purely techincal matter the best way would probably be to trim off specifically the strings " Jr", ", Jr", ", Jr.", "III", ", III", etc. from the end of the string containing the name, and then get everything from the last space in the string to the (new, after having removed Jr, etc.) end. This wouldn't get, say, "Del La Hoya" from your example, but you can't even really count on a human to get that - I'm making an educated guess that John Mark Del La Hoya's last name is "Del La Hoya" and not "Mark Del La Hoya" because I"m a native English speaker and I have some intuition about what Spanish last names look like - if the name were, say "Gauthip Yeidze Ka Illunyepsi" I would have absolutely no idea whether to count that Ka as part of the last name or not because I have no idea what language that's from.

like image 157
Tneuktippa Avatar answered Dec 04 '22 14:12

Tneuktippa


Came across a lib called "nameparser" at https://pypi.python.org/pypi/nameparser It handles four out of six cases above:

#!/usr/bin/env python
from nameparser import HumanName

def get_lname(somename):
    name = HumanName(somename)
    return name.last

people_names = [
    ('John Smith', 'Smith'),
    ('John Maxwell Smith', 'Smith'),
    # ('John Smith Jr', 'Smith Jr'),
    ('John van Damme', 'van Damme'),
    # ('John Smith, IV', 'Smith, IV'),
    ('John Mark Del La Hoya', 'Del La Hoya')
]

for name, target in people_names:
    print('{} --> {} <-- {}'.format(name, get_lname(name), target))
    assert get_lname(name) == target    
like image 24
Down the Stream Avatar answered Dec 04 '22 13:12

Down the Stream