I recently started using weka and I'm trying to classify tweets into positive or negative using Naive Bayes. So I have a training set with tweets that I gave the label for and a test set with tweets that all have the label "positive". When I ran Naive Bayes, I get the following results:
Correctly classified instances: 69 92% Incorrectly classified instances: 6 8%
Then if I change the labels of the tweets in the test set to "negative" and ran again Naive Bayes, the results are inversed:
Correctly classified instances: 6 8% Incorrectly classified instances: 69 92%
I thought that correctly classified instances show the accuracy of Naive Bayes and that it should be the same no matter the labels of the tweets in test set. Is there something wrong with my data or I don't understand correctly the meaning of correctly classified instances?
Thanks a lot for your time,
Nantia
It is same because correctly classified instances means the sum of TP and TN. Similarly, incorrectly classified instances means the sum of FP and FN. The total number of correctly instances divided by total number of instances gives the accuracy.
Our classifier has got an accuracy of 92.4%. Weka even prints the Confusion matrix for you which gives different metrics.
The performance of the prediction is measured using accuracy, precision and recall. Results: Based on the analysis, the K-means clustering method is 78.59% accurate among the merit-based-admission students and 94.627% among the regular-admission students.
The labels on the test set are supposed to be the actual correct classification. Performance is computed by asking the classifier to give its best guess about the classification for each instance in the test set. Then the predicted classifications are compared to the actual classifications to determine accuracy. Therefore, if you flip the 'correct' values that you give it, the results will be flipped as well.
Based on your training set, 69.92% of your instances are classified as positive. If the labels for the test set, that is the correct answers, indicate that they are all positive, then that makes 69.92% correct. If the test set (and thus the classification) is the same, but you switch the correct answers, then of course, the percentage correct will also be the opposite.
Keep in mind that in order to evaluate a classifier, you need the true labels of the test set. Otherwise you can't compare the classifier's answers with the true answers. It seems to me that you might have misunderstood this. You can obtain the labels for unseen data, if that is what you want, but in that case you can't evaluate classifier accuracy.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With