I have some text data for which I need to do sentiment classification. I don't have positive or negative labels on this data (unlabelled). I want to use the Gensim word2vec model for sentiment classification.
Is it possible to do this? Because till now I couldn't find anything which does that?
Every blog and article are using some kind of labelled dataset (such as imdb dataset)to train and test the word2vec model. No one going further and predicting their own unlabelled data.
Can someone tell me the possibility of this (at least theoretically)?
Thanks in Advance!
If it is a simple text(and not sticking to word2vec), it can be classified with VADER model irrespective of labels. Just need to give the text to api.
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer as si
a = 'This is a good movie'
si.polarity_scores(a)
which returns below result.
{'neg': 0.0, 'neu': 0.58, 'pos': 0.42, 'compound': 0.4404}
YES, There are 2 main methods to do sentiment just like any machine learning problem. Supervised Sentiment Analysis and unsupervised Sentiment Analysis. In the 1st way, you definitely need a labelled dataset. In that way, you can use simple logistic regression or deep learning model like "LSTM". But in unsupervised Sentiment Analysis, You don't need any labeled data. In that way, you can use a clustering algorithm. K-Means clustering is a popular algorithm for this task. Following medium article contains a worked example for your solution,
https://towardsdatascience.com/unsupervised-sentiment-analysis-a38bf1906483
To add your question, Word embedding such as word2vec or fasttext has nothing to do with supervised or unsupervised sentiment analysis. There are very powerful ways to represent features of your dataset. BTW, fasttext is more accurate than word2vec according to my experience.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With