Good dataset for sentiment analysis? [closed]

Tags:

I am working on sentiment analysis and I am using dataset given in this link: http://www.cs.jhu.edu/~mdredze/datasets/sentiment/index2.html and I have divided my dataset into 50:50 ratio. 50% are used as test samples and 50% are used as train samples and the features extracted from train samples and perform classification using Weka classifier, but my predication accuracy is about 70-75%.

Can anybody suggest some other datasets which can help me to increase the result - I have used unigram, bigram and POStags as my features.

576

asked Jul 07 '14 08:07

user3512562

2 Answers

There are many sources to get sentiment analysis dataset:

huge ngrams dataset from google storage.googleapis.com/books/ngrams/books/datasetsv2.html
http://www.sananalytics.com/lab/twitter-sentiment/
http://inclass.kaggle.com/c/si650winter11/data
http://nlp.stanford.edu/sentiment/treebank.html
or you can look into this global ML dataset repository: https://archive.ics.uci.edu/ml

Anyway, it does not mean it will help you to get a better accuracy for your current dataset because the corpus might be very different from your dataset. Apart from reducing the testing percentage vs training, you could: test other classifiers or fine tune all hyperparameters using semi-automated wrapper like CVParameterSelection or GridSearch, or even auto-weka if it fits.

It is quite rare to use 50/50, 80/20 is quite a commonly occurring ratio. A better practice is to use: 60% for training, 20% for cross validation, 20% for testing.

188

answered Nov 18 '22 23:11

doxav

I started to gather sentiment analysis tools/datasets/lexicons in one place, it could be useful for you too: https://github.com/laugustyniak/awesome-sentiment-analysis

Start PR if you want to add something more or just write to me. I worked a lot with Amazon data [millions of reviews].

answered Nov 19 '22 00:11

l.augustyniak

Related questions
                            
                                C# XMLDocument to DataTable?
                            
                                .NET - How do I retrieve specific items out of a Dataset?
                            
                                Get free historic stock market/exchange data (e.g. S&P 500, NYSE)? [closed]
                            
                                "Include in Project" strange behavior for dataset in VisualStudio 2013
                            
                                Complex dataset split - StratifiedGroupShuffleSplit
                            
                                Dremel - repetition and definition level
                            
                                How to put datasets into an R package
                            
                                R How to read a file from google drive using R
                            
                                PHP Script to populate MySQL tables
                            
                                adding a datatable in a dataset
                            
                                How do I use lambda expressions to filter DataRows?
                            
                                Convert Dataset to XML
                            
                                Read and reverse data chunk by chunk from a csv file and copy to a new csv file
                            
                                DataSet class in Java?
                            
                                tensorflow Dataset API diff between make_initializable_iterator and make_one_shot_iterator
                            
                                Exclude data sets from R package build
                            
                                How does glmnet's standardize argument handle dummy variables?
                            
                                How do you alter the size of a Pytorch Dataset?
                            
                                What is the right order of insertion/deletion/modification on dataset?
                            
                                Does the dataset size influence a machine learning algorithm?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Good dataset for sentiment analysis? [closed]

Tags:

dataset

sentiment-analysis

web-mining

user3512562

People also ask

2 Answers

doxav

l.augustyniak

Recent Activity

Donate For Us