How to ignore a feature while including it as part of feature set in Weka GUI

Tags:

weka

I am using Weka GUI to run a NaiveBayes classifier on an online post. I am trying to track the instances (online posts) that are incorrectly predicted so that I can learn further how I can improve the features.

Currently, I have a work around to do that: I generate the data with unique ID included, and when I import to Weka I remove the uniqueID. I then attach the prediction appender, which saves prediction results to an .arff file. I read through the file to find instances with bad performance. For incorrectly classified instances, I use certain feature values that give unique enough value for each instance and find the instance with the same value from my original data, which contains the unique ID. As you can see, this is a truly time consuming process.

I would love to hear if there is a way to ignore a feature, which in my case is the unique ID of an instance, while keeping it as part of the data when running the classifier.

Thank you.

697

asked Sep 23 '12 05:09

Jina Huh

1 Answers

I'm not sure if weka GUI has a direct option for that. However you can achieve the same through commandline

java weka.classifiers.meta.FilteredClassifier -F weka.filters.unsupervised.attribute.RemoveType -W weka.classifiers.trees.RandomForest -t G:\pub-resampled-0.5.arff -T G:\test.csv.arff -p 1 -distribution > G:\out.txt

In the above example, first attribute is an an identifier (string). RemoveType filter will remove all string fields while building the model. However, you can still ask weka to include that identifier as part of the output (predictions) by passing as argument to -p. In my case first attribute (partner_id) is identifier so it gets listed in the output along with predictions. (-distribution option is to output prediction scores for all class labels). You can get more details from http://weka.wikispaces.com/Instance+ID

=== Predictions on test data ===

 inst#     actual  predicted error distribution (partner_id)
     1        1:?        2:0       0,*1 (8i7t3)
     2        1:?        2:0       0,*1 (8i7u1)
     3        1:?        2:0       0,*1 (8i7um)
     4        1:?        2:0       0.1,*0.9 (8i7ux)
     5        1:?        2:0       0,*1 (8i7va)
     6        1:?        2:0       0,*1 (8i7vb)
     7        1:?        2:0       0,*1 (8i7vf)

Hope you find this helpful..

196

answered Sep 18 '22 02:09

naresh

Related questions
                            
                                Weka, SimpleKMeans cannot handle string attributes
                            
                                ID attribute in Weka
                            
                                Weka throws "UnassignedDatasetException"
                            
                                WEKA: How to filter multiple attribute ranges?
                            
                                Hadoop: Easy way to have object as output value without Writable interface
                            
                                How to calculate the threshold value for numeric attributes in Quinlan's C4.5 algorithm?
                            
                                Getting Xmeans clusterer output programmatically in Weka
                            
                                How to use weights in Weka
                            
                                interpreting Naive Bayes results
                            
                                How to apply classifier in Weka's Explorer?
                            
                                How to create a bag of words using Weka?
                            
                                Interpreting the output of StringToWordVector() - Weka
                            
                                Output weka results to text file
                            
                                WEKA & MySQL Setup a connection
                            
                                Principal Component Analysis on Weka
                            
                                Using Neural Network Class in WEKA in Java code
                            
                                Natural Language Processing - Features for Text Classification
                            
                                Weka : How to prepare test set in weka
                            
                                Boolean attributes in Weka
                            
                                Creating a string attribute in Weka Java API

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With