Hi below is my code to remove stopwords and get the named entities for text which contains technology related terms like java, lan, port, socket etc
import nltk
from nltk.corpus import stopwords
import codecs
import os
import base64
def stop_final():
result=[]
text="some technology related text"
text = nltk.word_tokenize(text)
for word in text:
if word not in stopwords.words('english'):
result.append(word)
print nltk.ne_chunk(nltk.pos_tag(result))
stop_final()
From the above code i am getting Person entities for lan, socket etc, so i am not getting accurate result, so please suggest me how can i get correct named entities for my text
Thanks
Late, but here goes. Also, this is not a solution, more an explanation of the problem and trying to point the reader in the right direction. Hope this helps someone.
NLTK uses a dictionary of stopwords in that module, so that will not filter everything you are looking for. You'll have to look at assigning POS tags to your words and filtering irrelevant types to your problem.
However, you are looking for terms that can be both nouns and proper nouns. Therefore, both Java and Ian would get through. The problem is that POS tags does not contain the extra information that you are looking for, i.e., that the words should be technology related.
This is an extremely difficult problem to solve with a high accuracy, since you'll need to infer context out of your text. This is a current research problem in the fields of Natural Language Processing (NLP) and Machine Learning.
Possible solutions may contain some of the following techniques.
You can start building your own stopwords list, by adding words to the list as you spot them (after POS tags filtering). This is tedious and error prone, but simpler than the alternatives.
There are methods in NLP called Name-entity resolution that you can look at.
Also, checkout Goolge's Ngram corpus viewer. They did some interesting things with that.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With