Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XGBoost Error when using xgboost function

Tags:

r

xgboost

Here is my code:

xgb <- xgboost(data = as.matrix(df_all_combined), 
               label = as.matrix(target_train), 
               eta = 0.1,
               max_depth = 15, 
               nround=100, 
               subsample = 0.5,
               colsample_bytree = 0.5,
               seed = 1,
               eval_metric = "auc",
               objective = "binary:logistic",
               num_class = 12,
               nthread = 3)

Getting the below Error:

Error in xgb.iter.update(bst$handle, dtrain, iteration - 1, obj) : [09:17:34] amalgamation/../src/objective/regression_obj.cc:90: Check failed: (preds.size()) == (info.labels.size()) labels are not correctly providedpreds.size=840756, label.size=70063

Could anyone help me out to solve this issue? Not able to figure out the issue.

like image 535
gayathri dornadula Avatar asked Jan 21 '17 04:01

gayathri dornadula


People also ask

What is not a valid use of XGBoost?

XGBoost can be avoided in following scenarios: Noisy Data: In case of noisy data, boosting models may overfit. In such cases, Random Forest can provide better results than boosting models, as Random Forest models reduce variance. XGBoost, or Tree based algorithms in general, cannot extrapolate.

What is verbose in XGBoost?

Basically with False (e.g. 0) it does not print anything. With any integer, it will print the evaluation score at that step. So for verbose=100 it will tell you the score every 100 iterations. Setting verbose=True is the same as setting it to 1. Thus it will print a lot!

What is Max_depth in XGBoost?

max_depth: The maximum depth per tree. A deeper tree might increase the performance, but also the complexity and chances to overfit. The value must be an integer greater than 0. Default is 6.


2 Answers

Try remove num_class = 12 from your parameters.

like image 192
Sara Avatar answered Sep 21 '22 22:09

Sara


The error says: labels are not correctly provided preds.size=840756, label.size=70063

This means that number of rows in df_all_combined does not correspond to the number of rows in target_train

So target_train should be of the shape (840756,)

like image 43
Abhijay Ghildyal Avatar answered Sep 17 '22 22:09

Abhijay Ghildyal