Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm to determine how positive or negative a statement/text is

Tags:

algorithm

nlp

I need an algorithm to determine if a sentence, paragraph or article is negative or positive in tone... or better yet, how negative or positive.

For instance:

Jason is the worst SO user I have ever witnessed (-10)

Jason is an SO user (0)

Jason is the best SO user I have ever seen (+10)

Jason is the best at sucking with SO (-10)

While, okay at SO, Jason is the worst at doing bad (+10)

Not easy, huh? :)

I don't expect somebody to explain this algorithm to me, but I assume there is already much work on something like this in academia somewhere. If you can point me to some articles or research, I would love it.

Thanks.

like image 532
Jason Avatar asked Nov 15 '08 20:11

Jason


People also ask

What is used to determine whether a text is positive or negative?

Sentiment analysis (or opinion mining) is a natural language processing (NLP) technique used to determine whether data is positive, negative or neutral. Sentiment analysis is often performed on textual data to help businesses monitor brand and product sentiment in customer feedback, and understand customer needs.

What are the algorithm used for sentiment analysis?

There are multiple machine learning algorithms used for sentiment analysis like Support Vector Machine (SVM), Recurrent Neural Network (RNN), Convolutional Neural Network (CNN), Random Forest, Naïve Bayes, and Long Short-Term Memory (LSTM), Kuko and Pourhomayoun (2020).

How do you analyze a sentiment analysis?

Sentiment analysis looks at the emotion expressed in a text. It is commonly used to analyze customer feedback, survey responses, and product reviews. Social media monitoring, reputation management, and customer experience are just a few areas that can benefit from sentiment analysis.


2 Answers

There is a sub-field of natural language processing called sentiment analysis that deals specifically with this problem domain. There is a fair amount of commercial work done in the area because consumer products are so heavily reviewed in online user forums (ugc or user-generated-content). There is also a prototype platform for text analytics called GATE from the university of sheffield, and a python project called nltk. Both are considered flexible, but not very high performance. One or the other might be good for working out your own ideas.

like image 62
fawce Avatar answered Sep 24 '22 19:09

fawce


In my company we have a product which does this and also performs well. I did most of the work on it. I can give a brief idea:

You need to split the paragraph into sentences and then split each sentence into smaller sub sentences - splitting based on commas, hyphen, semi colon, colon, 'and', 'or', etc. Each sub sentence will be exhibiting a totally seperate sentiment in some cases.

Some sentences even if it is split, will have to be joined together.

Eg: The product is amazing, excellent and fantastic.

We have developed a comprehensive set of rules on the type of sentences which need to be split and which shouldn't be (based on the POS tags of the words)

On the first level, you can use a bag of words approach, meaning - have a list of positive and negative words/phrases and check in every sub sentence. While doing this, also look at the negation words like 'not', 'no', etc which will change the polarity of the sentence.

Even then if you can't find the sentiment, you can go for a naive bayes approach. This approach is not very accurate (about 60%). But if you apply this to only sentence which fail to pass through the first set of rules - you can easily get to 80-85% accuracy.

The important part is the positive/negative word list and the way you split things up. If you want, you can go even a level higher by implementing HMM (Hidden Markov Model) or CRF (Conditional Random Fields). But I am not a pro in NLP and someone else may fill you in that part.

For the curious people, we implemented all of this is python with NLTK and the Reverend Bayes module.

Pretty simple and handles most of the sentences. You may however face problems when trying to tag content from the web. Most people don't write proper sentences on the web. Also handling sarcasm is very hard.

like image 28
cnu Avatar answered Sep 24 '22 19:09

cnu