I am working on sentiment analysis and I am using dataset given in this link: http://www.cs.jhu.edu/~mdredze/datasets/sentiment/index2.html
and I have divided my dataset into 50:50 ratio. 50% are used as test samples and 50% are used as train samples and the features extracted from train samples and perform classification using Weka classifier, but my predication accuracy is about 70-75%.
Can anybody suggest some other datasets which can help me to increase the result - I have used unigram, bigram and POStags as my features.
Sentiment analysis (or opinion mining) is a natural language processing (NLP) technique used to determine whether data is positive, negative or neutral. Sentiment analysis is often performed on textual data to help businesses monitor brand and product sentiment in customer feedback, and understand customer needs.
Sentiment analysis data for customer experience such as consumer comments and reviews can be gathered from review websites such as Google reviews, Superpages, Demandforce, Clutch, etc.
Social Media are the main resource The most common use of Sentiment Analysis is this of classifying a text to a class. Depending on the dataset and the reason, Sentiment Classification can be binary (positive or negative) or multi-class (3 or more classes) problem.
There are many sources to get sentiment analysis dataset:
Anyway, it does not mean it will help you to get a better accuracy for your current dataset because the corpus might be very different from your dataset. Apart from reducing the testing percentage vs training, you could: test other classifiers or fine tune all hyperparameters using semi-automated wrapper like CVParameterSelection or GridSearch, or even auto-weka if it fits.
It is quite rare to use 50/50, 80/20 is quite a commonly occurring ratio. A better practice is to use: 60% for training, 20% for cross validation, 20% for testing.
I started to gather sentiment analysis tools/datasets/lexicons in one place, it could be useful for you too: https://github.com/laugustyniak/awesome-sentiment-analysis
Start PR if you want to add something more or just write to me. I worked a lot with Amazon data [millions of reviews].
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With