Logo Questions Linux Laravel Mysql Ubuntu Git Menu

How to get Predictions with XGBoost and XGBoost using Scikit-Learn Wrapper to match?

I am new to XGBoost in Python so I apologize if the answer here is obvious, but I am trying to take a panda dataframe and get XGBoost in Python to give me the same predictions I get when I use the Scikit-Learn wrapper for the same exercise. So far I've been unable to do so. Just to give an example, here I take the boston dataset, convert to a panda dataframe, train on the first 500 observations of the dataset and then predict the last 6. I do it with XGBoost first and then with the Scikit-Learn wrapper and I get different predictions even though I've set the parameters of the model to be the same. Specifically the array predictions looks very different from the array predictions2 (see code below). Any help would be much appreciated!

from sklearn import datasets
import pandas as pd
import xgboost as xgb
from xgboost.sklearn import XGBClassifier
from xgboost.sklearn import XGBRegressor

### Use the boston data as an example, train on first 500, predict last 6 
boston_data = datasets.load_boston()
df_boston = pd.DataFrame(boston_data.data,columns=boston_data.feature_names)
df_boston['target'] = pd.Series(boston_data.target)

#### Code using XGBoost
Sub_train = df_boston.head(500)
target = Sub_train["target"]
Sub_train = Sub_train.drop('target', axis=1) 

Sub_predict = df_boston.tail(6)
Sub_predict = Sub_predict.drop('target', axis=1)  

xgtrain = xgb.DMatrix(Sub_train.as_matrix(), label=target.tolist())
xgtest = xgb.DMatrix(Sub_predict.as_matrix())

params = {'booster': 'gblinear', 'objective': 'reg:linear', 
      'max_depth': 2, 'learning_rate': .1, 'n_estimators': 500,    'min_child_weight': 3, 'colsample_bytree': .7,
      'subsample': .8, 'gamma': 0, 'reg_alpha': 1}

model = xgb.train(dtrain=xgtrain, params=params)

predictions = model.predict(xgtest)

#### Code using Sk learn Wrapper for XGBoost
model = XGBRegressor(learning_rate =.1, n_estimators=500,
max_depth=2, min_child_weight=3, gamma=0, 
subsample=.8, colsample_bytree=.7, reg_alpha=1, 
objective= 'reg:linear')

target = "target"

Sub_train = df_boston.head(500)
Sub_predict = df_boston.tail(6)
Sub_predict = Sub_predict.drop('target', axis=1)

Ex_List = ['target']

predictors = [i for i in Sub_train.columns if i not in Ex_List]

model = model.fit(Sub_train[predictors],Sub_train[target])

predictions2 = model.predict(Sub_predict)
like image 246
Joseph E Avatar asked Oct 25 '17 23:10

Joseph E

People also ask

How do I get predictions on XGBoost?

To make predictions we use the scikit-learn function model. predict(). By default, the predictions made by XGBoost are probabilities. Because this is a binary classification problem, each prediction is the probability of the input pattern belonging to the first class.

How do you predict using XGBoost in Python?

As usual, you start by importing the library xgboost and other important libraries that you will be using for building the model. Note you can install python libraries like xgboost on your system using pip install xgboost on cmd. Separate the target variable and rest of the variables using . iloc to subset the data.

Is XGBoost compatible with Sklearn?

XGBoost is easy to implement in scikit-learn. XGBoost is an ensemble, so it scores better than individual models.

What is predict () Sklearn?

The Sklearn 'Predict' Method Predicts an Output That being the case, it provides a set of tools for doing things like training and evaluating machine learning models. What is this? And it also has tools to predict an output value, once the model is trained (for ML techniques that actually make predictions).

1 Answers

Please look at this answer here

xgboost.train will ignore parameter n_estimators, while xgboost.XGBRegressor accepts. In xgboost.train, boosting iterations (i.e. n_estimators) is controlled by num_boost_round(default: 10)

It suggests to remove n_estimators from params supplied to xgb.train and replace it with num_boost_round.

So change your params like this:

params = {'objective': 'reg:linear', 
      'max_depth': 2, 'learning_rate': .1,    
      'min_child_weight': 3, 'colsample_bytree': .7,
      'subsample': .8, 'gamma': 0, 'alpha': 1}

And train xgb.train like this:

model = xgb.train(dtrain=xgtrain, params=params,num_boost_round=500)

And you will get same results.

Alternatively, keep the xgb.train as it is and change the XGBRegressor like this:

model = XGBRegressor(learning_rate =.1, n_estimators=10,
                     max_depth=2, min_child_weight=3, gamma=0, 
                     subsample=.8, colsample_bytree=.7, reg_alpha=1, 
                     objective= 'reg:linear')

Then also you will get same results.

like image 169
Vivek Kumar Avatar answered Oct 05 '22 00:10

Vivek Kumar