Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - Extracting all camel case words in a sequence

Tags:

python

nltk

I am trying to return a list of all the words beginning with a capital letter or title case in a string that are in a sequence.

For example, in the string John Walker Smith is currently in New York I would like to return the list as below:

['John Walker Smith', 'New York']

My code below works only when there are two title words. How do I extend this to pick up more than two title words in a sequence.

def get_composite_names(s):
    l = [x for x in s.split()]
    nouns = []
    for i in range(0,len(l)):
        if i > len(l)-2:
            break
        if l[i] == l[i].title() and l[i+1] == l[i+1].title():
                temp = l[i]+' '+l[i+1]
                nouns.append(temp)
    return nouns
like image 354
jax Avatar asked May 31 '26 18:05

jax


1 Answers

Here's one way to accomplish this without regex:

from itertools import groupby

string = "John Walker Smith  is currently in New York"

groups = []

for key, group in groupby(string.split(), lambda x: x[0].isupper()):
    if key:
        groups.append(' '.join(list(group)))

print groups
# ['John Walker Smith', 'New York']
like image 66
cmaher Avatar answered Jun 02 '26 08:06

cmaher