Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XGBoost: what's wrong with my xgb.cv call syntax?

Tags:

python

xgboost

I'm trying to use xgboost on Python.

Here is my code. xgb.train works but I get on error with xgb.cv although it seems I used it the correct way.

The following works for me:

###### XGBOOST ######

import datetime
startTime = datetime.datetime.now()

import xgboost as xgb
data_train   = np.array(traindata.drop('Category',axis=1))
labels_train = np.array(traindata['Category'].cat.codes)

data_valid   = np.array(validdata.drop('Category',axis=1))
labels_valid = np.array(validdata['Category'].astype('category').cat.codes)

weights_train = np.ones(len(labels_train))
weights_valid  = np.ones(len(labels_valid ))

dtrain = xgb.DMatrix( data_train, label=labels_train,weight = weights_train)
dvalid  = xgb.DMatrix( data_valid , label=labels_valid ,weight = weights_valid )

param = {'bst:max_depth':5, 'bst:eta':0.05, # eta [default=0.3]
         #'min_child_weight':1,'gamma':0,'subsample':1,'colsample_bytree':1,'scale_pos_weight':0, # default
         # max_delta_step:0 # default
         'min_child_weight':5,'scale_pos_weight':0, 'max_delta_step':2,
         'subsample':0.8,'colsample_bytree':0.8,
         'silent':1, 'objective':'multi:softprob' }

param['nthread'] = 4
param['eval_metric'] = 'mlogloss'
param['lambda'] = 2
param['num_class']=39

evallist  = [(dtrain,'train'),(dvalid,'eval')] # if there is a validation set
# evallist  = [(dtrain,'train')]                   # if there is no validation set

plst = param.items()
plst += [('ams@0','eval_metric')]

num_round = 100

bst = xgb.train( plst, dtrain, num_round, evallist,early_stopping_rounds=5 ) # early_stopping_rounds=10 # when there is a validation set

# bst.res=xgb.cv(plst,dtrain,num_round,nfold = 5,evallist,early_stopping_rounds=5)

bst.save_model('0001.model')

# dump model
bst.dump_model('dump.raw.txt')
# dump model with feature map
# bst.dump_model('dump.raw.txt','featmap.txt')

x = datetime.datetime.now() - startTime
print(x)

But if I change the line:

bst = xgb.train( plst, dtrain, num_round, evallist,early_stopping_rounds=5 )

to this:

bst.res=xgb.cv(plst,dtrain,num_round,nfold = 5,evallist,early_stopping_rounds=5)

I get the following unexpected error:

File "<ipython-input-46-ebdf0546f464>", line 45
    bst.res=xgb.cv(plst,dtrain,num_round,nfold = 5,evallist,early_stopping_rounds=5) SyntaxError: non-keyword arg after
keyword arg

EDIT: following the advice below from @martineau, and trying this

bst.res=xgb.cv(plst,dtrain,num_round,evallist,nfold = 5,early_stopping_rounds=5)

yields this error

TypeError Traceback (most recent call last) in () 43 # bst = xgb.train( plst, dtrain, num_round, evallist,early_stopping_rounds=5 ) # early_stopping_rounds=10 # when there is a validation set 44 ---> 45 bst.res=xgb.cv(plst,dtrain,num_round,evallist,nfold = 5,early_stopping_rounds=5) 46 47 bst.save_model('0001.model')

TypeError: cv() got multiple values for keyword argument 'nfold'

like image 265
Fagui Curtain Avatar asked Jun 06 '16 01:06

Fagui Curtain


1 Answers

You can't use evallist in cv. So you should remove evallist from the arguments of the xgb.cv call. Put another way, you should try:

bst.res = xgb.cv(plst, dtrain, num_round, nfold=5, early_stopping_rounds=5)

instead of

bst.res=xgb.cv(plst,dtrain,num_round,nfold = 5,evallist,early_stopping_rounds=5)

Chris, the python training API slightly changed between the pip version and the current master branch in github. They mainly added the keyword args verbose_eval, callbacks and folds to the cv function. The verbose_eval and callbacks keywords were already there in the pip version for the train function but not for the cv one.

like image 97
Adrien Renaud Avatar answered Nov 12 '22 17:11

Adrien Renaud