Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to ignore a feature while including it as part of feature set in Weka GUI

Tags:

weka

I am using Weka GUI to run a NaiveBayes classifier on an online post. I am trying to track the instances (online posts) that are incorrectly predicted so that I can learn further how I can improve the features.

Currently, I have a work around to do that: I generate the data with unique ID included, and when I import to Weka I remove the uniqueID. I then attach the prediction appender, which saves prediction results to an .arff file. I read through the file to find instances with bad performance. For incorrectly classified instances, I use certain feature values that give unique enough value for each instance and find the instance with the same value from my original data, which contains the unique ID. As you can see, this is a truly time consuming process.

I would love to hear if there is a way to ignore a feature, which in my case is the unique ID of an instance, while keeping it as part of the data when running the classifier.

Thank you.

like image 697
Jina Huh Avatar asked Sep 23 '12 05:09

Jina Huh


People also ask

What is filtering in Weka?

Weka include many filters that can be used before invoking a classifier to clean up the dataset, or alter it in some way. Filters help with data preparation. For example, you can easily remove an attribute.

What is attribute selection in Weka?

In Weka, you have three options of performing attribute selection from commandline (not everything is possible from the GUI): the native approach, using the attribute selection classes directly. using a meta-classifier. the filter approach.

What options are available on main panel of Weka?

In this WEKA tutorial, we provided an introduction to the open-source WEKA Machine Learning Software and explained step by step download and installation process. We have also seen the five options available for Weka Graphical User Interface, namely, Explorer, Experimenter, Knowledge flow, Workbench, and Simple CLI.

What is class attribute in Weka?

This type of attribute represents a fixed set of nominal values. string: This type of attribute represents a dynamically expanding set of nominal values. String attributes are not used by the learning schemes in Weka. They can be used, for example, to store an identifier with each instance in a dataset.


1 Answers

I'm not sure if weka GUI has a direct option for that. However you can achieve the same through commandline

java weka.classifiers.meta.FilteredClassifier -F weka.filters.unsupervised.attribute.RemoveType -W weka.classifiers.trees.RandomForest -t G:\pub-resampled-0.5.arff -T G:\test.csv.arff -p 1 -distribution > G:\out.txt

In the above example, first attribute is an an identifier (string). RemoveType filter will remove all string fields while building the model. However, you can still ask weka to include that identifier as part of the output (predictions) by passing as argument to -p. In my case first attribute (partner_id) is identifier so it gets listed in the output along with predictions. (-distribution option is to output prediction scores for all class labels). You can get more details from http://weka.wikispaces.com/Instance+ID

=== Predictions on test data ===

 inst#     actual  predicted error distribution (partner_id)
     1        1:?        2:0       0,*1 (8i7t3)
     2        1:?        2:0       0,*1 (8i7u1)
     3        1:?        2:0       0,*1 (8i7um)
     4        1:?        2:0       0.1,*0.9 (8i7ux)
     5        1:?        2:0       0,*1 (8i7va)
     6        1:?        2:0       0,*1 (8i7vb)
     7        1:?        2:0       0,*1 (8i7vf)

Hope you find this helpful..

like image 196
naresh Avatar answered Sep 18 '22 02:09

naresh