Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

WEKA classification likelihood of the classes

I would like to know if there is a way in WEKA to output a number of 'best-guesses' for a classification.

My scenario is: I classify the data with cross-validation for instance, then on weka's output I get something like: these are the 3 best-guesses for the classification of this instance. What I want is like, even if an instance isn't correctly classified i get an output of the 3 or 5 best-guesses for that instance.

Example:

Classes: A,B,C,D,E Instances: 1...10

And output would be: instance 1 is 90% likely to be class A, 75% likely to be class B, 60% like to be class C..

Thanks.

like image 567
user1454263 Avatar asked Aug 14 '12 20:08

user1454263


People also ask

What is the classification accuracy in Weka?

Our classifier has got an accuracy of 92.4%. Weka even prints the Confusion matrix for you which gives different metrics.

Which classifier is best in Weka?

In fact, the highest accuracy belongs to the Meta classifier.

How does Weka calculate accuracy?

The total number of correctly instances divided by total number of instances gives the accuracy. In weka, % of correctly classified instances give the accuracy of the model.


2 Answers

Weka's API has a method called Classifier.distributionForInstance() tha can be used to get the classification prediction distribution. You can then sort the distribution by decreasing probability to get your top-N predictions.

Below is a function that prints out: (1) the test instance's ground truth label; (2) the predicted label from classifyInstance(); and (3) the prediction distribution from distributionForInstance(). I have used this with J48, but it should work with other classifiers.

The inputs parameters are the serialized model file (which you can create during the model training phase and applying the -d option) and the test file in ARFF format.

public void test(String modelFileSerialized, String testFileARFF) 
    throws Exception
{
    // Deserialize the classifier.
    Classifier classifier = 
        (Classifier) weka.core.SerializationHelper.read(
            modelFileSerialized);

    // Load the test instances.
    Instances testInstances = DataSource.read(testFileARFF);

    // Mark the last attribute in each instance as the true class.
    testInstances.setClassIndex(testInstances.numAttributes()-1);

    int numTestInstances = testInstances.numInstances();
    System.out.printf("There are %d test instances\n", numTestInstances);

    // Loop over each test instance.
    for (int i = 0; i < numTestInstances; i++)
    {
        // Get the true class label from the instance's own classIndex.
        String trueClassLabel = 
            testInstances.instance(i).toString(testInstances.classIndex());

        // Make the prediction here.
        double predictionIndex = 
            classifier.classifyInstance(testInstances.instance(i)); 

        // Get the predicted class label from the predictionIndex.
        String predictedClassLabel =
            testInstances.classAttribute().value((int) predictionIndex);

        // Get the prediction probability distribution.
        double[] predictionDistribution = 
            classifier.distributionForInstance(testInstances.instance(i)); 

        // Print out the true label, predicted label, and the distribution.
        System.out.printf("%5d: true=%-10s, predicted=%-10s, distribution=", 
                          i, trueClassLabel, predictedClassLabel); 

        // Loop over all the prediction labels in the distribution.
        for (int predictionDistributionIndex = 0; 
             predictionDistributionIndex < predictionDistribution.length; 
             predictionDistributionIndex++)
        {
            // Get this distribution index's class label.
            String predictionDistributionIndexAsClassLabel = 
                testInstances.classAttribute().value(
                    predictionDistributionIndex);

            // Get the probability.
            double predictionProbability = 
                predictionDistribution[predictionDistributionIndex];

            System.out.printf("[%10s : %6.3f]", 
                              predictionDistributionIndexAsClassLabel, 
                              predictionProbability );
        }

        o.printf("\n");
    }
}
like image 137
stackoverflowuser2010 Avatar answered Oct 05 '22 23:10

stackoverflowuser2010


I don't know if you can do it natively, but you can just get the probabilities for each class, sorted them and take the first three.

The function you want is distributionForInstance(Instance instance) which returns a double[] giving the probability for each class.

like image 24
Antimony Avatar answered Oct 06 '22 00:10

Antimony