Assuming western naming convention of FirstName MiddleName(s) LastName
,
What would be the best way to correctly parse out the last name from a full name?
For example:
John Smith --> 'Smith'
John Maxwell Smith --> 'Smith'
John Smith Jr --> 'Smith Jr'
John van Damme --> 'van Damme'
John Smith, IV --> 'Smith, IV'
John Mark Del La Hoya --> 'Del La Hoya'
...and the countless other permutations from this.
Probably the best answer here is not to try. Names are individual and idosyncratic and, even limiting yourself to the Western tradition, you can never be sure that you'll have thought of all the edge cases. A friend of mine legally changed his name to be a single word, and he's had a hell of a time dealing with various institutions whose procedures can't deal with this. You're in a unique position of being the one creating the software that implements a procedure, and so you have an opportunity to design something that isn't going to annoy the crap out of people with unconventional names. Think about why you need to be parsing out the last name to begin with, and see if there's something else you could do.
That being said, as a purely techincal matter the best way would probably be to trim off specifically the strings " Jr", ", Jr", ", Jr.", "III", ", III", etc. from the end of the string containing the name, and then get everything from the last space in the string to the (new, after having removed Jr, etc.) end. This wouldn't get, say, "Del La Hoya" from your example, but you can't even really count on a human to get that - I'm making an educated guess that John Mark Del La Hoya's last name is "Del La Hoya" and not "Mark Del La Hoya" because I"m a native English speaker and I have some intuition about what Spanish last names look like - if the name were, say "Gauthip Yeidze Ka Illunyepsi" I would have absolutely no idea whether to count that Ka as part of the last name or not because I have no idea what language that's from.
Came across a lib called "nameparser" at https://pypi.python.org/pypi/nameparser It handles four out of six cases above:
#!/usr/bin/env python
from nameparser import HumanName
def get_lname(somename):
name = HumanName(somename)
return name.last
people_names = [
('John Smith', 'Smith'),
('John Maxwell Smith', 'Smith'),
# ('John Smith Jr', 'Smith Jr'),
('John van Damme', 'van Damme'),
# ('John Smith, IV', 'Smith, IV'),
('John Mark Del La Hoya', 'Del La Hoya')
]
for name, target in people_names:
print('{} --> {} <-- {}'.format(name, get_lname(name), target))
assert get_lname(name) == target
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With