Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to rank features by their importance in a Weka classifier?

I use Weka to successfully build a classifier. I would now like to evaluate how effective or important my features are. Fot this I use AttributeSelection. But I don't know how to ouput the different features with their corresponding importance. I want simply list the features in decreasing order of their information gain scores!

like image 573
khadre Avatar asked Jan 21 '14 20:01

khadre


People also ask

How do I apply feature selection in Weka?

Features ExtractionUnder the Attribute Evaluator and Search Method, you will find several options. We will just use the defaults here. In the Attribute Selection Mode, use full training set option. At the bottom of the result window, you will get the list of Selected attributes.

What is ranker search method in Weka?

In the last lesson we saw the CfsSubsetEvaluation method, and that uses symmetric uncertainty, so there is a symmetric uncertainty attribute evaluator in Weka. The ranker search method is very simple. It just sorts attributes according to their evaluation, and you can specify the number of attributes to retain.

What is Weka highlight the features of Weka?

Weka is a collection of machine learning algorithms for data mining tasks. Weka features include machine learning, data mining, preprocessing, classification, regression, clustering, association rules, attribute selection, experiments, workflow and visualization.

How attribute selection is achieved in Weka tool?

In Weka, you have three options of performing attribute selection from commandline (not everything is possible from the GUI): the native approach, using the attribute selection classes directly. using a meta-classifier. the filter approach.


1 Answers

There are many ways of scoring the features, which are called attributes, in Weka. These methods are available as subclasses of weka.attributeSelection.ASEvaluation.

Any of these evaluation classes will give you a score for each attribute. If you use information gain for scoring, for example, you will be using it the class InfoGainAttributeEval. The helpful methods are

  • InfoGainAttributeEval.html#buildEvaluator(), and
  • InfoGainAttributeEval.html#evaluateAttribute()

The other types of feature scoring (gain ratio, correlation, etc.) have the same methods for scoring. Using any of these, you can rank all your features.

The ranking itself is independent of Weka. Of the many ways of doing it, this is one:

Map<Attribute, Double> infogainscores = new HashMap<Attribute, Double>();
for (int i = 0; i < instances.numAttributes(); i++) {
    Attribute t_attr = instaces.attribute(i);
    double infogain  = evaluation.evaluateAttribute(i);
    infogainscores.put(t_attr, infogain);
}

Now you have a map which needs to be sorted by value. Here's a generic code to do that:

 /**
  * Provides a {@code SortedSet} of {@code Map.Entry} objects. The sorting is in ascending order if {@param order} > 0
  * and descending order if {@param order} <= 0.
  * @param map   The map to be sorted.
  * @param order The sorting order (positive means ascending, non-positive means descending).
  * @param <K>   Keys.
  * @param <V>   Values need to be {@code Comparable}.
  * @return      A sorted set of {@code Map.Entry} objects.
  */
 static <K,V extends Comparable<? super V>> SortedSet<Map.Entry<K,V>>
 entriesSortedByValues(Map<K,V> map, final int order) {
     SortedSet<Map.Entry<K,V>> sortedEntries = new TreeSet<>(
         new Comparator<Map.Entry<K,V>>() {
             public int compare(Map.Entry<K,V> e1, Map.Entry<K,V> e2) {
                 return (order > 0) ? compareToRetainDuplicates(e1.getValue(), e2.getValue()) : compareToRetainDuplicates(e2.getValue(), e1.getValue());
         }
     }
    );
    sortedEntries.addAll(map.entrySet());
    return sortedEntries;
}

and finally,

private static <V extends Comparable<? super V>> int compareToRetainDuplicates(V v1, V v2) {
    return (v1.compareTo(v2) == -1) ? -1 : 1;
}

Now you have a list of entries sorted by values (in ascending or descending order, as you wish). Go crazy with it!

Please note that you should handle the case where more than one attribute has the same information gain. That is why I went through the process of sorting by values while retaining duplicates.

like image 103
Chthonic Project Avatar answered Nov 15 '22 08:11

Chthonic Project