Parse text to get the proper nouns (names and organizations) - python nltk

Question

I am trying to extract proper nouns as in Names and Organization names from very small chunks of texts like sms, the basic parsers available with nltk Finding Proper Nouns using NLTK WordNet are being able to get the nouns but the problem is when we get proper nouns not starting with a capital letter , for texts like this the names like sumit do not get recognized as proper nouns

>>> sentence = "i spoke with sumit and rajesh and Samit about the gridlock situation last night @ around 8 pm last nite"
>>> tagged_sent = pos_tag(sentence.split())
>>> print tagged_sent
[('i', 'PRP'), ('spoke', 'VBP'), ('with', 'IN'), **('sumit', 'NN')**, ('and', 'CC'), ('rajesh', 'JJ'), ('and', 'CC'), **('Samit', 'NNP'),** ('about', 'IN'), ('the', 'DT'), ('gridlock', 'NN'), ('situation', 'NN'), ('last', 'JJ'), ('night', 'NN'), ('@', 'IN'), ('around', 'IN'), ('8', 'CD'), ('pm', 'NN'), ('last', 'JJ'), ('nite', 'NN')]

user278064 · Accepted Answer

There is a better way to extract names of people and organizations

from nltk import pos_tag, ne_chunk
from nltk.tokenize import SpaceTokenizer

tokenizer = SpaceTokenizer()
toks = tokenizer.tokenize(sentence)
pos = pos_tag(toks)
chunked_nes = ne_chunk(pos) 

nes = [' '.join(map(lambda x: x[0], ne.leaves())) for ne in chunked_nes if isinstance(ne, nltk.tree.Tree)]

However all Named Entity Recognizers commit errors. If you really don't want to miss any proper name, you could use a dict of Proper Names and check if the name is contained in the dict.

Saheel Godhane · Answer

You might want to have a look at python-nameparser. It tries to guess capitalization of names also. Sorry for the incomplete answer but I don't have much experience using python-nameparser.

Best of luck!

Parse text to get the proper nouns (names and organizations) - python nltk

Tags:

python

nltk

Brij Raj Singh - MSFT

2 Answers

user278064

Saheel Godhane

Recent Activity

Donate For Us

Parse text to get the proper nouns (names and organizations) - python nltk

Tags:

python

nltk

Brij Raj Singh - MSFT

2 Answers

user278064

Saheel Godhane

Related questions

Recent Activity

Donate For Us