I've been trying to extract names from a string, but don't seem to be close to success.
Here is the code:
string = "555-1239Moe Szyslak(636) 555-0113Burns, C. Montgomery555 -6542Rev. Timothy Lovejoy555 8904Ned Flanders636-555-3226Simpson, Homer5553642Dr. Julius Hibbert"
regex = re.compile(r'([A-Z][a-z]+(?: [A-Z][a-z]\.)? [A-Z][a-z]+)')
print(regex.findall(string))
This is the output I'm getting:
['Moe Szyslak', 'Timothy Lovejoy', 'Ned Flanders', 'Julius Hibbert']
Regex example to split a string into words In this example, we will split the target string at each white-space character using the \s special sequence. Let's add the + metacharacter at the end of \s . Now, The \s+ regex pattern will split the target string on the occurrence of one or more whitespace characters.
Using index() + loop to extract string between two substrings. In this, we get the indices of both the substrings using index(), then a loop is used to iterate within the index to find the required string between them.
Extracting human names even in English is notoriously hard. The following regex solves your particular problem but may fail on other inputs (e.g., it does not capture names with dashes):
re.findall(r"[A-Z][a-z]+,?\s+(?:[A-Z][a-z]*\.?\s*)?[A-Z][a-z]+", string)
#['Moe Szyslak', 'Burns, C. Montgomery', 'Timothy Lovejoy',
# 'Ned Flanders', 'Simpson, Homer', 'Julius Hibbert']
And with titles:
TITLE = r"(?:[A-Z][a-z]*\.\s*)?"
NAME1 = r"[A-Z][a-z]+,?\s+"
MIDDLE_I = r"(?:[A-Z][a-z]*\.?\s*)?"
NAME2 = r"[A-Z][a-z]+"
re.findall(TITLE + NAME1 + MIDDLE_I + NAME2, string)
#['Moe Szyslak', 'Burns, C. Montgomery', 'Rev. Timothy Lovejoy',
# 'Ned Flanders', 'Simpson, Homer', 'Dr. Julius Hibbert']
As a side note, there is no need to compile a regex unless you plan to reuse it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With