Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to remove proper nouns from a sentence using python?

Is there any package which i can use to to remove proper nouns from a sentence using Python?

I know of a few packages like NLTK, Stanford and Text Blob which does the job(removes names) but they also remove a lot of words which start with a capital letter but are not proper nouns.

Also, i cannot have a dictionary of names because it'll be huge and will keep extending as the data keeps populating in the DB.

like image 684
Pri Avatar asked Sep 22 '16 08:09

Pri


1 Answers

If you want to just remove single words that are proper nouns, you can use nltk and tag your sentence in question, then remove all words with the tags that are proper nouns.

>>> import nltk
>>> nltk.tag.pos_tag("I am named John Doe".split())
[('I', 'PRP'), ('am', 'VBP'), ('named', 'VBN'), ('John', 'NNP'), ('Doe', 'NNP')]

The default tagger uses the Penn Treebank POS tagset which has only two proper noun tags: NNP and NNPS

So you can just do the following:

>>> sentence = "I am named John Doe"
>>> tagged_sentence = nltk.tag.pos_tag(sentence.split())
>>> edited_sentence = [word for word,tag in tagged_sentence if tag != 'NNP' and tag != 'NNPS']
>>> print(' '.join(edited_sentence))
I am named

Now, just as a warning, POS tagging is not 100% accurate and may mistag some ambiguous words. Also, you will not capture Named Entities in this way as they are multiword in nature.

like image 151
Nathan McCoy Avatar answered Sep 28 '22 02:09

Nathan McCoy