Creating a sentiment analysis tool

Question

I'm trying to create a sentiment analysis tool to analyse tweets over a three day period about Manchester United football club and determine whether people view them positively or negatively. I am currently using this guide for guidance (with Java being my coding language)

http://cavajohn.blogspot.co.uk/2013/05/how-to-sentiment-analysis-of-tweets.html

I am using Apache Flume to download my tweets into Apache Hadoop and then am intending to use Apache Hive to query the tweets. I may also use Apache Oozie to partition the tweets effectively.

In the link I posted above, it is mentioned that I need to have a training dataset to train the classifier I will create to analyse the tweets. The sample classifier provided has some 5000 tweets. As I am doing this for a summer project for uni, I feel I should probably create my own dataset.

What is the minimum amount of tweets I should use to make this classifier effective? Is there a recommended number? For example, if I manually analysed a hundred tweets, or five hundred, or a thousand, would it be effective?

Hernandcb · Accepted Answer

There is not a exact number to train a classifier. You can have a large dataset where all the data has the same attributes so you classifier will memorize a pattern, or, you can have a no so big dataset with good instances so you classifier will have better results.

You can train the classifier using the sample dataset that they give you in the post and use the cross validation in order to get the best classifier.

After you got the best classifier, you can compare your classifier with the classifier provided in the post and choose the better.

Creating a sentiment analysis tool

Tags:

java

hadoop

sentiment-analysis

twitter4j

Andrew Martin

1 Answers

Hernandcb

Recent Activity

Donate For Us

Creating a sentiment analysis tool

Tags:

java

hadoop

sentiment-analysis

twitter4j

Andrew Martin

1 Answers

Hernandcb

Related questions

Recent Activity

Donate For Us