Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Resample Filter of WEKA - How to interpret the result

Tags:

I am currently strugeling with a machine learning problem whereas I have to deal with great unbalanced data sets. That is, there are six classes ('1','2'...'6'). Unfortunately there are e.g. for class '1' 150 examples/instances, for '2' 90 instances and for class '3' only 20. All other classes can't be "trained" since there are no available instances for these classes.

So far, I figured out that WEKA (the machine learning toolkit I am using) provides this supervised "Resample" filter. When I apply this filter with 'noReplacement'=false and 'bialToUniformClass'=1.0 then this results in a data set, where the the number of instances is nice and almost equal (for class '1'..'3' and the others stay empty).

My question is now: how does WEKA and this filter generate "new"/additional instances for different classes.

Thank you very much in advance for any hints or suggestions.

Cheers Julian

like image 991
Julian Avatar asked Dec 09 '09 15:12

Julian


2 Answers

It doesn't. It's resampling existing instances. If you have one class-2 instance, and ask for a resampling with a bias of 1.0, you can expect N copies of that instance and N other instances of each other type for which there is already data.

like image 141
James Avatar answered Oct 13 '22 00:10

James


Using WEKA's supervised Resample filter adds instances to a class. This realized by simply adding instances from the class which has only few instances multiple times to the result data set.

Therefore the resulting data set is strongly biased in terms of a class for which only few samples are available.

like image 44
Julian Avatar answered Oct 13 '22 01:10

Julian