I recently started to learn NLP and ML using Python. I started with Sentiment Analysis. I'm having trouble understanding where machine learning comes in to play when doing sentiment analysis.
Let's say I'm analyzing tweets or news headlines using NLTK's SentimentIntensityAnalyzer and I'm loading a case relevant lexicons so I'm getting polarity and negativity, positivity, neutral scores. Now what I don't understand is, in which case should I use code like in this article:
Sentiment with ML toturial
or just the built-in like in NLTK or even something like Google's BERT?
Any answer or link to Blog or tutorial would be welcomed!
SentimentIntensityAnalyzer
is a tool built specifically for analyzing sentiment, it is easy to use, but can miss some cases, for example:
In [52]: from nltk.sentiment.vader import SentimentIntensityAnalyzer
In [53]: sia = SentimentIntensityAnalyzer()
In [54]: sia.polarity_scores("I am not going to miss using this product.")
Out[54]: {'neg': 0.0, 'neu': 0.829, 'pos': 0.171, 'compound': 0.1139}
A Machine Learning approach, like the one outlined in your link more involved it focuses on creating features, often using TF-IDF, but certainly not limited to. And then a Machine Learning is used on top of that. This approach relies on availability of good enough and large enough training dataset. Often feature extraction is the more important part and a simple model, like Logistic Regression is chosen.
BERT is pretrained model, that can be fine tuned, thought it doesn't have to be I found that fine tuning helps in my experience.
The main advantages of BERT:
With enough training data BERT can be very powerful, with enough training data it should be able to get an example in the beginning of my post correctly. And this is a huge advantage.
Since BERT is already pretrained it might require relatively small number of training samples to give good reasonable results.
Because BERT does not require (or require a lot less) feature engineering, it can be fast in terms of ML engineering work to get good initial results.
The main limitations of BERT are:
Learning curve, mostly conceptually understanding how it works. Using BERT is not very hard.
BERT is slow to train and predict. You pretty much have to use at least a moderate GPU even for a small dataset.
Lack of transparency. It is really hard to know why BERT based model is suggesting what it is suggesting.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With