Is there any package which i can use to to remove proper nouns from a sentence using Python?
I know of a few packages like NLTK, Stanford and Text Blob which does the job(removes names) but they also remove a lot of words which start with a capital letter but are not proper nouns.
Also, i cannot have a dictionary of names because it'll be huge and will keep extending as the data keeps populating in the DB.
If you want to just remove single words that are proper nouns, you can use nltk
and tag your sentence in question, then remove all words with the tags that are proper nouns.
>>> import nltk
>>> nltk.tag.pos_tag("I am named John Doe".split())
[('I', 'PRP'), ('am', 'VBP'), ('named', 'VBN'), ('John', 'NNP'), ('Doe', 'NNP')]
The default tagger uses the Penn Treebank POS tagset which has only two proper noun tags: NNP
and NNPS
So you can just do the following:
>>> sentence = "I am named John Doe"
>>> tagged_sentence = nltk.tag.pos_tag(sentence.split())
>>> edited_sentence = [word for word,tag in tagged_sentence if tag != 'NNP' and tag != 'NNPS']
>>> print(' '.join(edited_sentence))
I am named
Now, just as a warning, POS tagging is not 100% accurate and may mistag some ambiguous words. Also, you will not capture Named Entities in this way as they are multiword in nature.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With