Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Weka J48 Classifier: Cannot handle numeric class?

I'm now trying to build a J48 (C4.5) classifier model on my training data using Weka.

First I do this, which seems to go OK:

java -Xmx10G -cp /weka/weka.jar weka.core.converters.TextDirectoryLoader -dir /home/test/cats > /home/test/cats.arff

This seems to go OK too:

java -Xmx10G -cp /weka/weka.jar weka.filters.unsupervised.attribute.StringToWordVector -i /home/test/cats.arff -o /home/test/cats-vector.arff

This does not go OK:

java -Xmx10G -cp /weka/weka.jar weka.classifiers.trees.J48 -t /home/test/cats-vector.arff -d /home/test/cats.model

It gives the following error:

weka.core.UnsupportedAttributeTypeException: weka.classifiers.trees.j48.C45Prune                 ableClassifierTree: Cannot handle numeric class!
        at weka.core.Capabilities.test(Capabilities.java:954)
        at weka.core.Capabilities.test(Capabilities.java:1110)
        at weka.core.Capabilities.test(Capabilities.java:1023)
        at weka.core.Capabilities.testWithFail(Capabilities.java:1302)
        at weka.classifiers.trees.j48.C45PruneableClassifierTree.buildClassifier                 (C45PruneableClassifierTree.java:116)
        at weka.classifiers.trees.J48.buildClassifier(J48.java:236)
        at weka.classifiers.Evaluation.evaluateModel(Evaluation.java:1076)
        at weka.classifiers.Classifier.runClassifier(Classifier.java:312)
        at weka.classifiers.trees.J48.main(J48.java:948)

So I then tried this:

java -Xmx10G -cp /weka/weka.jar weka.classifiers.trees.J48 -t /home/test/cats.arff -d /home/test/cats.model

Which also gives the error:

weka.core.UnsupportedAttributeTypeException: weka.classifiers.trees.j48.C45PruneableClassifierTree: Cannot handle string attributes!
        at weka.core.Capabilities.test(Capabilities.java:980)
        at weka.core.Capabilities.test(Capabilities.java:869)
        at weka.core.Capabilities.test(Capabilities.java:1085)
        at weka.core.Capabilities.test(Capabilities.java:1023)
        at weka.core.Capabilities.testWithFail(Capabilities.java:1302)
        at weka.classifiers.trees.j48.C45PruneableClassifierTree.buildClassifier(C45PruneableClassifierTree.java:116)
        at weka.classifiers.trees.J48.buildClassifier(J48.java:236)
        at weka.classifiers.Evaluation.evaluateModel(Evaluation.java:1076)
        at weka.classifiers.Classifier.runClassifier(Classifier.java:312)
        at weka.classifiers.trees.J48.main(J48.java:948)

Obviously I've prepared the data wrong somehow (BTW the input is text files in subdirectories which are named by the categories that I want). But I thought I was following the instructions on the Weka Wiki: Weka Wiki Categorizing Text Files Weka Wiki Primer

So what am I doing wrong? I would like to use J48 because it's given high accuracy on my data in tests. So what do I do to my data to get the J48 classifier to accept it? Or do I need to use a different classifier?

Please help!

like image 401
Alasdair Avatar asked Dec 31 '25 14:12

Alasdair


1 Answers

The word vectors could be converted to binary like this:

java -Xmx4G -cp /weka/weka.jar weka.filters.unsupervised.attribute.NumericToBinary -i /home/test/cats-vector.arff -o /home/test/cats-binary.arff

Although this adds bias to the kind of data you are training against. This implies that binary strings very close to one-another are treated as more similar to strings far away. If you want to erase this bias and regard each string as a totally unique entity then use @attribute class {ABC, DEF, GHI, etc} Then it works!

If you really want to communicate that these features are important and not-at-all related, make a whole column for each string, where it has the value '1' for when a row has that category, and 0 when it does not. This creates very sparse data, but then the learning algorithm has a bias to scan that data for information gain.

like image 99
Alasdair Avatar answered Jan 02 '26 11:01

Alasdair



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!