How to purposely overfit Weka tree classifiers?

Tags:

weka

I have a binary class dataset (0 / 1) with a large skew towards the "0" class (about 30000 vs 1500). There are 7 features for each instance, no missing values.

When I use the J48 or any other tree classifier, I get almost all of the "1" instances misclassified as "0".

Setting the classifier to "unpruned", setting minimum number of instances per leaf to 1, setting confidence factor to 1, adding a dummy attribute with instance ID number - all of this didn't help.

I just can't create a model that overfits my data!

I've also tried almost all of the other classifiers Weka provides, but got similar results.

Using IB1 gets 100% accuracy (trainset on trainset) so it's not a problem of multiple instances with the same feature values and different classes.

How can I create a completely unpruned tree? Or otherwise force Weka to overfit my data?

Thanks.

Update: Okay, this is absurd. I've used only about 3100 negative and 1200 positive examples, and this is the tree I got (unpruned!):

J48 unpruned tree
------------------

F <= 0.90747: 1 (201.0/54.0)
F > 0.90747: 0 (4153.0/1062.0)

Needless to say, IB1 still gives 100% precision.

Update 2: Don't know how I missed it - unpruned SimpleCart works and gives 100% accuracy train on train; pruned SimpleCart is not as biased as J48 and has a decent false positive and negative ratio.

797

asked Jul 11 '10 07:07

Haggai

1 Answers

Weka contains two meta-classifiers of interest:

weka.classifiers.meta.CostSensitiveClassifier
weka.classifiers.meta.MetaCost

They allows you to make any algorithm cost-sensitive (not restricted to SVM) and to specify a cost matrix (penalty of the various errors); you would give a higher penalty for misclassifying 1 instances as 0 than you would give for erroneously classifying 0 as 1.

The result is that the algorithm would then try to:

minimize expected misclassification cost (rather than the most likely class)

answered Oct 05 '22 23:10

Amro

Related questions
                            
                                One to many LSTM in Keras
                            
                                Classify words with the same meaning
                            
                                Does LSTM in Keras support dynamic sentence length or not?
                            
                                Inception5h vs Inception V4, what is 5h
                            
                                hierarchical classification in sklearn [closed]
                            
                                How to structure Tensorflow model code?
                            
                                Scikit-Learn Decision Tree: Probability of prediction being a or b?
                            
                                Why do I need to initialize variables in TensorFlow?
                            
                                Tensorflow can't detect GPU when invoked by Ray worker
                            
                                Non-linear multivariate time-series response prediction using RNN
                            
                                What is the difference between energy function and loss function? [closed]
                            
                                AttributeError when training CNN 1D with Python Keras
                            
                                Error in loading the model with load_weights in Keras
                            
                                How to calculate feature importance in each models of cross validation in sklearn
                            
                                Using Hyper-parameters from H2O to re-build XGBoost in Sklearn gives Difference Performance in Python
                            
                                Improve real-life results of neural network trained with mnist dataset
                            
                                Count number of the blues lines on white background in the image
                            
                                How to reset Keras metrics?
                            
                                OpenAI GPT-2 model use with TensorFlow JS
                            
                                How to compute hessian matrix for all parameters in a network in pytorch?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With