WEKA classification likelihood of the classes

Tags:

weka

I would like to know if there is a way in WEKA to output a number of 'best-guesses' for a classification.

My scenario is: I classify the data with cross-validation for instance, then on weka's output I get something like: these are the 3 best-guesses for the classification of this instance. What I want is like, even if an instance isn't correctly classified i get an output of the 3 or 5 best-guesses for that instance.

Example:

Classes: A,B,C,D,E Instances: 1...10

And output would be: instance 1 is 90% likely to be class A, 75% likely to be class B, 60% like to be class C..

Thanks.

567

asked Aug 14 '12 20:08

user1454263

2 Answers

Weka's API has a method called Classifier.distributionForInstance() tha can be used to get the classification prediction distribution. You can then sort the distribution by decreasing probability to get your top-N predictions.

Below is a function that prints out: (1) the test instance's ground truth label; (2) the predicted label from classifyInstance(); and (3) the prediction distribution from distributionForInstance(). I have used this with J48, but it should work with other classifiers.

The inputs parameters are the serialized model file (which you can create during the model training phase and applying the -d option) and the test file in ARFF format.

public void test(String modelFileSerialized, String testFileARFF) 
    throws Exception
{
    // Deserialize the classifier.
    Classifier classifier = 
        (Classifier) weka.core.SerializationHelper.read(
            modelFileSerialized);

    // Load the test instances.
    Instances testInstances = DataSource.read(testFileARFF);

    // Mark the last attribute in each instance as the true class.
    testInstances.setClassIndex(testInstances.numAttributes()-1);

    int numTestInstances = testInstances.numInstances();
    System.out.printf("There are %d test instances\n", numTestInstances);

    // Loop over each test instance.
    for (int i = 0; i < numTestInstances; i++)
    {
        // Get the true class label from the instance's own classIndex.
        String trueClassLabel = 
            testInstances.instance(i).toString(testInstances.classIndex());

        // Make the prediction here.
        double predictionIndex = 
            classifier.classifyInstance(testInstances.instance(i)); 

        // Get the predicted class label from the predictionIndex.
        String predictedClassLabel =
            testInstances.classAttribute().value((int) predictionIndex);

        // Get the prediction probability distribution.
        double[] predictionDistribution = 
            classifier.distributionForInstance(testInstances.instance(i)); 

        // Print out the true label, predicted label, and the distribution.
        System.out.printf("%5d: true=%-10s, predicted=%-10s, distribution=", 
                          i, trueClassLabel, predictedClassLabel); 

        // Loop over all the prediction labels in the distribution.
        for (int predictionDistributionIndex = 0; 
             predictionDistributionIndex < predictionDistribution.length; 
             predictionDistributionIndex++)
        {
            // Get this distribution index's class label.
            String predictionDistributionIndexAsClassLabel = 
                testInstances.classAttribute().value(
                    predictionDistributionIndex);

            // Get the probability.
            double predictionProbability = 
                predictionDistribution[predictionDistributionIndex];

            System.out.printf("[%10s : %6.3f]", 
                              predictionDistributionIndexAsClassLabel, 
                              predictionProbability );
        }

        o.printf("\n");
    }
}

137

answered Oct 05 '22 23:10

stackoverflowuser2010

I don't know if you can do it natively, but you can just get the probabilities for each class, sorted them and take the first three.

The function you want is distributionForInstance(Instance instance) which returns a double[] giving the probability for each class.

answered Oct 06 '22 00:10

Antimony

Related questions
                            
                                Getting correct shape for datapoint to predict with a Regression model after using One-Hot-Encoding in training
                            
                                How a Convolutional Neural Net handles channels
                            
                                How to map categorical data to category_encoders.OrdinalEncoder in python pandas dataframe
                            
                                Finding contours of a two-part letter
                            
                                Normalizing data with binary and continuous variables for machine learning
                            
                                Measure similarity between two documents using Doc2Vec
                            
                                My LSTM learns, loss decreases, but Numerical Gradients don't match Analytical Gradients
                            
                                how to run a pre-trained model in AWS sagemaker?
                            
                                What is the default batch size of pytorch SGD?
                            
                                Google colab pro GPU running extremely slow
                            
                                How can a genetic algorithm optimize a neural network's weights without knowing the search volume?
                            
                                When to use @tf.function decorator and when not? I know tf.function builds graph. But how to know when to build graphs?
                            
                                Can we make the ML model (pickle file) more robust, by accepting (or ignoring) new features?
                            
                                Python NEAT not learning further after a certain point
                            
                                How to rotate an image to align the text for extraction?
                            
                                What is the meaning of 'for _ in range() [duplicate]
                            
                                Calculating Nearest Match to Mean/Stddev Pair With LibSVM
                            
                                U-matrix and self organizing maps
                            
                                Is there a viable handwriting recognition library / program? [closed]
                            
                                Interpreting coefficient names in glmnet in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With