Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unseen nominal values in weka

I have a dataset with some nominal values as features. The training set I have has a set of values for the nominal features which are absent in my test set. For instance my feature in the training set corresponds to

@attribute h4 {br,pl,com,ro,th,np}

and the same feature in the test set has

@attribute h4 {br,pl,abc,th,def,ghi,lmno}

I believe because of this, weka is not allowing me to re-evaluate the model I built on my training set on my test set. Is there a way around this? Am I missing something?

EDIT: I'm using a RandomForest classifier.

Thanks

like image 229
DaTaBomB Avatar asked Nov 28 '13 05:11

DaTaBomB


1 Answers

Weka seeks all the nominal values used in test set to be exist in training set too because the classifier should learn before making predictions.

Also Weka uses nominal values with their indices; thus, it is important to use same order for nominal values of the same attribute to get reliable results.

In your case, just use the same values -that covers all values- in the same order for both training set and test set.

Your combined values {br,pl,com,ro,th,np,abc,th,def,ghi,lmno} can be used for both training set and test set.

like image 119
Gökhan Çoban Avatar answered Nov 04 '22 15:11

Gökhan Çoban