Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to do sentiment analysis of unlabelled text using word2vec model?

I have some text data for which I need to do sentiment classification. I don't have positive or negative labels on this data (unlabelled). I want to use the Gensim word2vec model for sentiment classification.
Is it possible to do this? Because till now I couldn't find anything which does that? Every blog and article are using some kind of labelled dataset (such as imdb dataset)to train and test the word2vec model. No one going further and predicting their own unlabelled data.

Can someone tell me the possibility of this (at least theoretically)?

Thanks in Advance!

like image 267
Piyush Ghasiya Avatar asked Oct 16 '22 04:10

Piyush Ghasiya


2 Answers

If it is a simple text(and not sticking to word2vec), it can be classified with VADER model irrespective of labels. Just need to give the text to api.

import nltk
        
from nltk.sentiment.vader import SentimentIntensityAnalyzer as si

a = 'This is a good movie'

si.polarity_scores(a)

which returns below result.

{'neg': 0.0, 'neu': 0.58, 'pos': 0.42, 'compound': 0.4404}
like image 133
Santosh K Avatar answered Oct 21 '22 06:10

Santosh K


YES, There are 2 main methods to do sentiment just like any machine learning problem. Supervised Sentiment Analysis and unsupervised Sentiment Analysis. In the 1st way, you definitely need a labelled dataset. In that way, you can use simple logistic regression or deep learning model like "LSTM". But in unsupervised Sentiment Analysis, You don't need any labeled data. In that way, you can use a clustering algorithm. K-Means clustering is a popular algorithm for this task. Following medium article contains a worked example for your solution,

https://towardsdatascience.com/unsupervised-sentiment-analysis-a38bf1906483

To add your question, Word embedding such as word2vec or fasttext has nothing to do with supervised or unsupervised sentiment analysis. There are very powerful ways to represent features of your dataset. BTW, fasttext is more accurate than word2vec according to my experience.

like image 37
Lahiru Avatar answered Oct 21 '22 06:10

Lahiru