Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract names from string with python Regex

Tags:

python

regex

I've been trying to extract names from a string, but don't seem to be close to success.

Here is the code:

string = "555-1239Moe Szyslak(636) 555-0113Burns, C. Montgomery555 -6542Rev. Timothy Lovejoy555 8904Ned Flanders636-555-3226Simpson, Homer5553642Dr. Julius Hibbert"
regex = re.compile(r'([A-Z][a-z]+(?: [A-Z][a-z]\.)? [A-Z][a-z]+)')
print(regex.findall(string))

This is the output I'm getting:

['Moe Szyslak', 'Timothy Lovejoy', 'Ned Flanders', 'Julius Hibbert']
like image 714
MQaiser Avatar asked Mar 16 '19 06:03

MQaiser


People also ask

How do you split a string in regex in Python?

Regex example to split a string into words In this example, we will split the target string at each white-space character using the \s special sequence. Let's add the + metacharacter at the end of \s . Now, The \s+ regex pattern will split the target string on the occurrence of one or more whitespace characters.

How do I extract a substring between two markers in Python?

Using index() + loop to extract string between two substrings. In this, we get the indices of both the substrings using index(), then a loop is used to iterate within the index to find the required string between them.


1 Answers

Extracting human names even in English is notoriously hard. The following regex solves your particular problem but may fail on other inputs (e.g., it does not capture names with dashes):

re.findall(r"[A-Z][a-z]+,?\s+(?:[A-Z][a-z]*\.?\s*)?[A-Z][a-z]+", string)
#['Moe Szyslak', 'Burns, C. Montgomery', 'Timothy Lovejoy', 
# 'Ned Flanders', 'Simpson, Homer', 'Julius Hibbert']

And with titles:

TITLE = r"(?:[A-Z][a-z]*\.\s*)?"
NAME1 = r"[A-Z][a-z]+,?\s+"
MIDDLE_I = r"(?:[A-Z][a-z]*\.?\s*)?"
NAME2 = r"[A-Z][a-z]+"

re.findall(TITLE + NAME1 + MIDDLE_I + NAME2, string)
#['Moe Szyslak', 'Burns, C. Montgomery', 'Rev. Timothy Lovejoy', 
# 'Ned Flanders', 'Simpson, Homer', 'Dr. Julius Hibbert']

As a side note, there is no need to compile a regex unless you plan to reuse it.

like image 123
DYZ Avatar answered Sep 17 '22 22:09

DYZ