Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Predicting new data using sklearn after standardizing the training data

I am using Sklearn to build a linear regression model (or any other model) with the following steps:

X_train and Y_train are the training data

  1. Standardize the training data

      X_train = preprocessing.scale(X_train)
    
  2. fit the model

     model.fit(X_train, Y_train)
    

Once the model is fit with scaled data, how can I predict with new data (either one or more data points at a time) using the fit model?

What I am using is

  1. Scale the data

    NewData_Scaled = preprocessing.scale(NewData)
    
  2. Predict the data

    PredictedTarget = model.predict(NewData_Scaled)
    

I think I am missing a transformation function with preprocessing.scale so that I can save it with the trained model and then apply it on the new unseen data? any help please.

like image 669
S.AMEEN Avatar asked Aug 05 '16 02:08

S.AMEEN


People also ask

What is predict () sklearn?

The Sklearn 'Predict' Method Predicts an Output That being the case, it provides a set of tools for doing things like training and evaluating machine learning models. What is this? And it also has tools to predict an output value, once the model is trained (for ML techniques that actually make predictions).

How do you predict new data values in Python?

Understanding the predict() function in Python This is when the predict() function comes into the picture. Python predict() function enables us to predict the labels of the data values on the basis of the trained model. The predict() function accepts only a single argument which is usually the data to be tested.

Does sklearn standardize data?

The sklearn. preprocessing package provides several common utility functions and transformer classes to change raw feature vectors into a representation that is more suitable for the downstream estimators. In general, learning algorithms benefit from standardization of the data set.

How do you connect model input data with predictions for machine learning?

To give inputs to a machine learning model, you have to create a NumPy array, where you have to input the values of the features you used to train your machine learning model. Then we can use that array in the model. predict() method, and at the end, it will give the predicted value as an output based on the inputs.


1 Answers

Take a look at these docs.

You can use the StandardScaler class of the preprocessing module to remember the scaling of your training data so you can apply it to future values.

from sklearn.preprocessing import StandardScaler
X_train = np.array([[ 1., -1.,  2.],
                    [ 2.,  0.,  0.],
                    [ 0.,  1., -1.]])
scaler = StandardScaler().fit(X_train)

scaler has calculated the mean and scaling factor to standardize each feature.

>>>scaler.mean_
array([ 1. ...,  0. ...,  0.33...])
>>>scaler.scale_                                       
array([ 0.81...,  0.81...,  1.24...])

To apply it to a dataset:

import numpy as np

X_train_scaled = scaler.transform(X_train)
new_data = np.array([-1.,  1., 0.])    
new_data_scaled = scaler.transform(new_data)
>>>new_data_scaled
array([[-2.44...,  1.22..., -0.26...]])
like image 85
ilyas patanam Avatar answered Oct 18 '22 13:10

ilyas patanam