How to get Predictions with XGBoost and XGBoost using Scikit-Learn Wrapper to match?

Tags:

I am new to XGBoost in Python so I apologize if the answer here is obvious, but I am trying to take a panda dataframe and get XGBoost in Python to give me the same predictions I get when I use the Scikit-Learn wrapper for the same exercise. So far I've been unable to do so. Just to give an example, here I take the boston dataset, convert to a panda dataframe, train on the first 500 observations of the dataset and then predict the last 6. I do it with XGBoost first and then with the Scikit-Learn wrapper and I get different predictions even though I've set the parameters of the model to be the same. Specifically the array predictions looks very different from the array predictions2 (see code below). Any help would be much appreciated!

from sklearn import datasets
import pandas as pd
import xgboost as xgb
from xgboost.sklearn import XGBClassifier
from xgboost.sklearn import XGBRegressor

### Use the boston data as an example, train on first 500, predict last 6 
boston_data = datasets.load_boston()
df_boston = pd.DataFrame(boston_data.data,columns=boston_data.feature_names)
df_boston['target'] = pd.Series(boston_data.target)


#### Code using XGBoost
Sub_train = df_boston.head(500)
target = Sub_train["target"]
Sub_train = Sub_train.drop('target', axis=1) 

Sub_predict = df_boston.tail(6)
Sub_predict = Sub_predict.drop('target', axis=1)  

xgtrain = xgb.DMatrix(Sub_train.as_matrix(), label=target.tolist())
xgtest = xgb.DMatrix(Sub_predict.as_matrix())

params = {'booster': 'gblinear', 'objective': 'reg:linear', 
      'max_depth': 2, 'learning_rate': .1, 'n_estimators': 500,    'min_child_weight': 3, 'colsample_bytree': .7,
      'subsample': .8, 'gamma': 0, 'reg_alpha': 1}

model = xgb.train(dtrain=xgtrain, params=params)

predictions = model.predict(xgtest)

#### Code using Sk learn Wrapper for XGBoost
model = XGBRegressor(learning_rate =.1, n_estimators=500,
max_depth=2, min_child_weight=3, gamma=0, 
subsample=.8, colsample_bytree=.7, reg_alpha=1, 
objective= 'reg:linear')

target = "target"

Sub_train = df_boston.head(500)
Sub_predict = df_boston.tail(6)
Sub_predict = Sub_predict.drop('target', axis=1)

Ex_List = ['target']

predictors = [i for i in Sub_train.columns if i not in Ex_List]

model = model.fit(Sub_train[predictors],Sub_train[target])

predictions2 = model.predict(Sub_predict)

246

asked Oct 25 '17 23:10

Joseph E

1 Answers

Please look at this answer here

xgboost.train will ignore parameter n_estimators, while xgboost.XGBRegressor accepts. In xgboost.train, boosting iterations (i.e. n_estimators) is controlled by num_boost_round(default: 10)

It suggests to remove n_estimators from params supplied to xgb.train and replace it with num_boost_round.

So change your params like this:

params = {'objective': 'reg:linear', 
      'max_depth': 2, 'learning_rate': .1,    
      'min_child_weight': 3, 'colsample_bytree': .7,
      'subsample': .8, 'gamma': 0, 'alpha': 1}

And train xgb.train like this:

model = xgb.train(dtrain=xgtrain, params=params,num_boost_round=500)

And you will get same results.

Alternatively, keep the xgb.train as it is and change the XGBRegressor like this:

model = XGBRegressor(learning_rate =.1, n_estimators=10,
                     max_depth=2, min_child_weight=3, gamma=0, 
                     subsample=.8, colsample_bytree=.7, reg_alpha=1, 
                     objective= 'reg:linear')

Then also you will get same results.

169

answered Oct 05 '22 00:10

Vivek Kumar

Related questions
                            
                                with os.scandir() raises AttributeError: __exit__
                            
                                statsmodels add_constant for OLS intercept, what is this actually doing?
                            
                                Sublime Text 3: Anaconda package error connection to localhost timed out
                            
                                vectorize percentile value of column B of column A (for groups)
                            
                                How to remove EOFError: EOF when reading a line?
                            
                                Data order in seaborn heatmap from pivot
                            
                                How to change page size to A4 in python-docx
                            
                                How to round float 0.5 up to 1.0, while still rounding 0.45 to 0.0, as the usual school rounding?
                            
                                Using scikit-learn NMF with a precomputed set of basis vectors (Python)
                            
                                Can a PyMC3 trace be loaded and values accessed without the original model in memory?
                            
                                TensorFlow - tf.layers vs tf.contrib.layers
                            
                                Index out of range when using lambda [duplicate]
                            
                                Pandas - Groupby with conditional formula
                            
                                Improve performance of converting numpy array to MATLAB double
                            
                                Python static method is not always callable
                            
                                Setup in virtualenv: `pip install -e .` vs `python setup.py install`
                            
                                Sorting a list: numbers in ascending, letters in descending
                            
                                Merge MultiIndex columns together into 1 level [duplicate]
                            
                                Python Keras LSTM learning converges too fast on high loss
                            
                                python -docx to extract table from word docx

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to get Predictions with XGBoost and XGBoost using Scikit-Learn Wrapper to match?

Tags:

python

scikit-learn

xgboost

Joseph E

People also ask

1 Answers

Vivek Kumar

Recent Activity

Donate For Us