Measuring the performance of classification algorithm

Question

I've got a classification problem in my hand, which I'd like to address with a machine learning algorithm ( Bayes, or Markovian probably, the question is independent on the classifier to be used). Given a number of training instances, I'm looking for a way to measure the performance of an implemented classificator, with taking data overfitting problem into account.

That is: given N[1..100] training samples, if I run the training algorithm on every one of the samples, and use this very same samples to measure fitness, it might stuck into a data overfitting problem -the classifier will know the exact answers for the training instances, without having much predictive power, rendering the fitness results useless.

An obvious solution would be seperating the hand-tagged samples into training, and test samples; and I'd like to learn about methods selecting the statistically significant samples for training.

White papers, book pointers, and PDFs much appreciated!

Rockcoder · Accepted Answer

You could use 10-fold Cross-validation for this. I believe it's pretty standard approach for classification algorithm performance evaluation.

The basic idea is to divide your learning samples into 10 subsets. Then use one subset for test data and others for train data. Repeat this for each subset and calculate average performance at the end.

Mark Davidson · Answer

As Mr. Brownstone said 10-fold Cross-Validation is probably the best way to go. I recently had to evaluate the performance of a number of different classifiers for this I used Weka. Which has an API and a load of tools that allow you to easily test the performance of lots of different classifiers.

Measuring the performance of classification algorithm

Tags:

artificial-intelligence

machine-learning

classification

nlp

bayesian

Silver Dragon

2 Answers

Rockcoder

Mark Davidson

Recent Activity

Donate For Us

Measuring the performance of classification algorithm

Tags:

artificial-intelligence

machine-learning

classification

nlp

bayesian

Silver Dragon

2 Answers

Rockcoder

Mark Davidson

Related questions

Recent Activity

Donate For Us