Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

XGBoost R vs python - different performance and feature importance

Tags:

python

r

xgboost

I have this problem with xgboost I use at work. My task is to port a piece of code that's currently running in R to python.

What the code does: My aim is to use XGBoost to determine the features with most gain. I made sure the inputs into the XGBoost are identical in R and python. The XGBoost is run roughly 100 times (on different data) and each time I extract 30 best features by gain.

My problem is this: The input in R and python are identical. Yet python and R output vastly different features(both in terms of total number of features per round, and which features are chosen). They only share about 50 % of features. My parameters are the same, and I don't use any samples, so there is no randomness.

Also, another thing I noticed - XGBoost is slower in python when compared to R with the same parameters. Is it a known issue?

R parameters

Python parameters

I've been trying to look around, but didn't find anyone having a similar problem. I can't share the data or code, because it's confidential. Does someone have an idea why the features differ so much?

R version: 3.4.3

XGBoost R version: 0.6.4.1

python version: 3.6.5

XGBoost python version: 0.71

Running on Windows.

like image 203
Johny Avatar asked Nov 07 '22 06:11

Johny


1 Answers

You set the internal seed in the R code but not the Python code.

More of an issue is likely that Python and R may also use different random number generators so despite always setting internal and external seeds you could get different sequences. This thread may help in that respect.

I would also hazard a guess that the variables not selected in one model provide similar information to those selected in the other, where swapping variables one way or another shouldn't impact model performance significantly. Although I don't know if the R model and the Python one perform the same?

like image 129
Shea Connell Avatar answered Nov 14 '22 22:11

Shea Connell