Load and predict new data sklearn

Tags:

I trained a Logistic model, cross-validated and saved it to file using joblib module. Now I want to load this model and predict new data with it. Is this the correct way to do this? Especially the standardization. Should I use scaler.fit() on my new data too? In the tutorials I followed, scaler.fit was only used on the training set, so I'm a bit lost here.

Here is my code:

#Loading the saved model with joblib
model = joblib.load('model.pkl')

# New data to predict
pr = pd.read_csv('set_to_predict.csv')
pred_cols = list(pr.columns.values)[:-1]

# Standardize new data
scaler = StandardScaler()
X_pred = scaler.fit(pr[pred_cols]).transform(pr[pred_cols])

pred = pd.Series(model.predict(X_pred))
print pred

632

asked Nov 21 '17 15:11

Marcos Santana

1 Answers

No, it's incorrect. All the data preparation steps should be fit using train data. Otherwise, you risk applying the wrong transformations, because means and variances that StandardScaler estimates do probably differ between train and test data.

The easiest way to train, save, load and apply all the steps simultaneously is to use Pipelines:

At training:

# prepare the pipeline
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.externals import joblib

pipe = make_pipeline(StandardScaler(), LogisticRegression)
pipe.fit(X_train, y_train)
joblib.dump(pipe, 'model.pkl')

At prediction:

#Loading the saved model with joblib
pipe = joblib.load('model.pkl')

# New data to predict
pr = pd.read_csv('set_to_predict.csv')
pred_cols = list(pr.columns.values)[:-1]

# apply the whole pipeline to data
pred = pd.Series(pipe.predict(pr[pred_cols]))
print pred

answered Sep 22 '22 18:09

David Dale

Related questions
                            
                                How to get the parameters' type and return type of a function?
                            
                                URL routing conflicts for static files in Flask dev server
                            
                                Python record audio on detected sound
                            
                                How to use Subprocess in Windows
                            
                                Vim: Change Max Line from 80 in pymode
                            
                                How to run sqlacodegen?
                            
                                How do I fix UnsupportedCharsetException in Eclipse Kepler/Luna with Jython/PyDev?
                            
                                how to use django-background-tasks
                            
                                Save a class into a binary file - Python
                            
                                How i convert different date python format? [duplicate]
                            
                                How can loop through a list from a certain index?
                            
                                Vcard parser with Python
                            
                                How does __setattr__ work with class attributes?
                            
                                SyntaxError: invalid syntax when i import avro in python3
                            
                                Importing 'Keys' from 'selenium.webdriver.common.keys'
                            
                                flask wtform TypeError: __init__() takes from 1 to 2 positional arguments but 3 were given
                            
                                I can't use pip (Windows)
                            
                                How to get img src in string in selenium using python
                            
                                AttributeError: module 'pandas' has no attribute 'core'
                            
                                Issue with DecimalField, max_digits of django models

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Load and predict new data sklearn

Tags:

python

machine-learning

scikit-learn

logistic-regression

joblib

Marcos Santana

People also ask

1 Answers

David Dale

Recent Activity

Donate For Us