I'm trying to use xgboost on Python.
Here is my code. xgb.train
works but I get on error with xgb.cv
although it seems I used it the correct way.
The following works for me:
###### XGBOOST ######
import datetime
startTime = datetime.datetime.now()
import xgboost as xgb
data_train = np.array(traindata.drop('Category',axis=1))
labels_train = np.array(traindata['Category'].cat.codes)
data_valid = np.array(validdata.drop('Category',axis=1))
labels_valid = np.array(validdata['Category'].astype('category').cat.codes)
weights_train = np.ones(len(labels_train))
weights_valid = np.ones(len(labels_valid ))
dtrain = xgb.DMatrix( data_train, label=labels_train,weight = weights_train)
dvalid = xgb.DMatrix( data_valid , label=labels_valid ,weight = weights_valid )
param = {'bst:max_depth':5, 'bst:eta':0.05, # eta [default=0.3]
#'min_child_weight':1,'gamma':0,'subsample':1,'colsample_bytree':1,'scale_pos_weight':0, # default
# max_delta_step:0 # default
'min_child_weight':5,'scale_pos_weight':0, 'max_delta_step':2,
'subsample':0.8,'colsample_bytree':0.8,
'silent':1, 'objective':'multi:softprob' }
param['nthread'] = 4
param['eval_metric'] = 'mlogloss'
param['lambda'] = 2
param['num_class']=39
evallist = [(dtrain,'train'),(dvalid,'eval')] # if there is a validation set
# evallist = [(dtrain,'train')] # if there is no validation set
plst = param.items()
plst += [('ams@0','eval_metric')]
num_round = 100
bst = xgb.train( plst, dtrain, num_round, evallist,early_stopping_rounds=5 ) # early_stopping_rounds=10 # when there is a validation set
# bst.res=xgb.cv(plst,dtrain,num_round,nfold = 5,evallist,early_stopping_rounds=5)
bst.save_model('0001.model')
# dump model
bst.dump_model('dump.raw.txt')
# dump model with feature map
# bst.dump_model('dump.raw.txt','featmap.txt')
x = datetime.datetime.now() - startTime
print(x)
But if I change the line:
bst = xgb.train( plst, dtrain, num_round, evallist,early_stopping_rounds=5 )
to this:
bst.res=xgb.cv(plst,dtrain,num_round,nfold = 5,evallist,early_stopping_rounds=5)
I get the following unexpected error:
File "<ipython-input-46-ebdf0546f464>", line 45 bst.res=xgb.cv(plst,dtrain,num_round,nfold = 5,evallist,early_stopping_rounds=5) SyntaxError: non-keyword arg after keyword arg
EDIT: following the advice below from @martineau, and trying this
bst.res=xgb.cv(plst,dtrain,num_round,evallist,nfold = 5,early_stopping_rounds=5)
yields this error
TypeError Traceback (most recent call last) in () 43 # bst = xgb.train( plst, dtrain, num_round, evallist,early_stopping_rounds=5 ) # early_stopping_rounds=10 # when there is a validation set 44 ---> 45 bst.res=xgb.cv(plst,dtrain,num_round,evallist,nfold = 5,early_stopping_rounds=5) 46 47 bst.save_model('0001.model')
TypeError: cv() got multiple values for keyword argument 'nfold'
You can't use evallist
in cv
.
So you should remove evallist
from the arguments of the xgb.cv
call.
Put another way, you should try:
bst.res = xgb.cv(plst, dtrain, num_round, nfold=5, early_stopping_rounds=5)
instead of
bst.res=xgb.cv(plst,dtrain,num_round,nfold = 5,evallist,early_stopping_rounds=5)
Chris,
the python training API slightly changed between the pip version and the current master branch in github. They mainly added the keyword args verbose_eval
, callbacks
and folds
to the cv
function. The verbose_eval
and callbacks
keywords were already there in the pip version for the train
function but not for the cv
one.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With