I am just doing some research into NLP with Python and I have identified something strange.
On review of the following negative tweets:
neg_tweets = [('I do not like this car', 'negative'),
('This view is horrible', 'negative'),
('I feel tired this morning', 'negative'),
('I am not looking forward to the concert', 'negative'),<---
('He is my enemy', 'negative')]
And with some processing by removing stop words.
clean_data = []
stop_words = set(stopwords.words("english"))
for (words, sentiment) in pos_tweets + neg_tweets:
words_filtered = [e.lower() for e in words.split() if e not in stop_words]
clean_data.append((words_filtered, sentiment))
Part of the output is:
(['i', 'looking', 'forward', 'concert'], 'negative')
I'm struggling to understand why the stop words include 'not' which can affect the sentiment of a tweet.
My understanding is that stop words have no value in terms of sentiment.
So, My question is why is 'not' included in the stopwords list?
Stopwords in a sentence are "generally" of little or no use. As said by Stanford NLP group:
Sometimes, some extremely common words which would appear to be of little value in helping select documents matching a user need are excluded from the vocabulary entirely. These words are called stop words
Why the word "not"? : Simply because it appears very often in the english vocabulary, and is "usually" of little or no importance, for example if you are doing text summarization where these stopwords are of little to no use and it is all determined by the frequency distribution of words(like tf-idf
.
So what can you do? Well, this is a very broad topic known as Negation Handling. It is a very broad area with many different methods. One of my favorite ones is to simply append preceding or succeeding negation clauses, before removing the stopwords or calculating word vectors. For example, you can convert not looking
to not_looking
which when computed upon and converted to vector space will be quite different. You can find a code for doing something similar in an SO answer here.
I hope this helps!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With