Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XgBoost : The least populated class in y has only 1 members, which is too few

Im using Xgboost implementation on sklearn for a kaggle's competition. However, im getting this 'warning' message :

$ python Script1.py /home/sky/private/virtualenv15.0.1dev/myVE/local/lib/python2.7/site-packages/sklearn/cross_validation.py:516:

Warning: The least populated class in y has only 1 members, which is too few. The minimum number of labels for any class cannot be less than n_folds=3. % (min_labels, self.n_folds)), Warning)

According to another question on stackoverflow : "Check that you have at least 3 samples per class to be able to do StratifiedKFold cross validation with k == 3 (I think this is the default CV used by GridSearchCV for classification)."

And well, i dont have at least 3 samples per class.

So my questions are:

a)what are the alternatives?

b) Why can't i use cross validation?

c) What can i use instead?

...
param_test1 = {
    'max_depth': range(3, 10, 2),
    'min_child_weight': range(1, 6, 2)
}

grid_search = GridSearchCV(

estimator=
XGBClassifier(
    learning_rate=0.1,
    n_estimators=3000,
    max_depth=15,
    min_child_weight=1,
    gamma=0,
    subsample=0.8,
    colsample_bytree=0.8,
    objective='multi:softmax',
    nthread=42,
    scale_pos_weight=1,
    seed=27),

    param_grid=param_test1, scoring='roc_auc', n_jobs=42, iid=False, cv=None, verbose=1)
...

grid_search.fit(train_x, place_id)

References:

One-shot learning with scikit-learn

Using a support vector classifier with polynomial kernel in scikit-learn

like image 474
KenobiBastila Avatar asked May 15 '16 15:05

KenobiBastila


1 Answers

If you have a target/class with only one sample, thats too few for any model. What you can do is get another dataset, preferably as balanced as possible, since most models behave better in balanced sets.

If you cannot have another dataset, you will have to play with what you have. I would suggest you remove the sample that has the lonely target. So you will have a model which does not cover that target. If that does not fit you requirements, you need a new dataset.

like image 161
Rabbit Avatar answered Oct 24 '22 23:10

Rabbit