Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NLP, difference between using NLTK's sentiment analysis and using ML approach

I recently started to learn NLP and ML using Python. I started with Sentiment Analysis. I'm having trouble understanding where machine learning comes in to play when doing sentiment analysis.

Let's say I'm analyzing tweets or news headlines using NLTK's SentimentIntensityAnalyzer and I'm loading a case relevant lexicons so I'm getting polarity and negativity, positivity, neutral scores. Now what I don't understand is, in which case should I use code like in this article:

Sentiment with ML toturial

or just the built-in like in NLTK or even something like Google's BERT?

Any answer or link to Blog or tutorial would be welcomed!


1 Answers

SentimentIntensityAnalyzer is a tool built specifically for analyzing sentiment, it is easy to use, but can miss some cases, for example:

In [52]: from nltk.sentiment.vader import SentimentIntensityAnalyzer                                                

In [53]: sia = SentimentIntensityAnalyzer()                                                                         

In [54]: sia.polarity_scores("I am not going to miss using this product.")                                          
Out[54]: {'neg': 0.0, 'neu': 0.829, 'pos': 0.171, 'compound': 0.1139}

A Machine Learning approach, like the one outlined in your link more involved it focuses on creating features, often using TF-IDF, but certainly not limited to. And then a Machine Learning is used on top of that. This approach relies on availability of good enough and large enough training dataset. Often feature extraction is the more important part and a simple model, like Logistic Regression is chosen.

BERT is pretrained model, that can be fine tuned, thought it doesn't have to be I found that fine tuning helps in my experience.

The main advantages of BERT:

  1. With enough training data BERT can be very powerful, with enough training data it should be able to get an example in the beginning of my post correctly. And this is a huge advantage.

  2. Since BERT is already pretrained it might require relatively small number of training samples to give good reasonable results.

  3. Because BERT does not require (or require a lot less) feature engineering, it can be fast in terms of ML engineering work to get good initial results.

The main limitations of BERT are:

  1. Learning curve, mostly conceptually understanding how it works. Using BERT is not very hard.

  2. BERT is slow to train and predict. You pretty much have to use at least a moderate GPU even for a small dataset.

  3. Lack of transparency. It is really hard to know why BERT based model is suggesting what it is suggesting.

like image 166
Akavall Avatar answered Sep 02 '25 19:09

Akavall