I am using Sklearn to build a linear regression model (or any other model) with the following steps: X_train and Y_train are the training data <ol> <li> Standardize the training data <pre class="prettyprint"><code> X_train = preprocessing.scale(X_train) </code></pre> </li> <li> fit the model <pre class="prettyprint"><code> model.fit(X_train, Y_train) </code></pre> </li> </ol> Once the model is fit with scaled data, how can I predict with new data (either one or more data points at a time) using the fit model? What I am using is <ol> <li> Scale the data <pre class="prettyprint"><code>NewData_Scaled = preprocessing.scale(NewData) </code></pre> </li> <li> Predict the data <pre class="prettyprint"><code>PredictedTarget = model.predict(NewData_Scaled) </code></pre> </li> </ol> I think I am missing a transformation function with <code>preprocessing.scale</code> so that I can save it with the trained model and then apply it on the new unseen data? any help please.

Take a look at these docs. You can use the <code>StandardScaler</code> class of the preprocessing module to remember the scaling of your training data so you can apply it to future values. <pre class="prettyprint"><code>from sklearn.preprocessing import StandardScaler X_train = np.array([[ 1., -1., 2.], [ 2., 0., 0.], [ 0., 1., -1.]]) scaler = StandardScaler().fit(X_train) </code></pre> <code>scaler</code> has calculated the mean and scaling factor to standardize each feature. <pre class="prettyprint"><code>>>>scaler.mean_ array([ 1. ..., 0. ..., 0.33...]) >>>scaler.scale_ array([ 0.81..., 0.81..., 1.24...]) </code></pre> To apply it to a dataset: <pre class="prettyprint"><code>import numpy as np X_train_scaled = scaler.transform(X_train) new_data = np.array([-1., 1., 0.]) new_data_scaled = scaler.transform(new_data) >>>new_data_scaled array([[-2.44..., 1.22..., -0.26...]]) </code></pre>

Predicting new data using sklearn after standardizing the training data

Tags:

python

machine-learning

scikit-learn

I am using Sklearn to build a linear regression model (or any other model) with the following steps:

X_train and Y_train are the training data

Standardize the training data

  X_train = preprocessing.scale(X_train)

fit the model
```
 model.fit(X_train, Y_train)
```

Once the model is fit with scaled data, how can I predict with new data (either one or more data points at a time) using the fit model?

What I am using is

Scale the data

NewData_Scaled = preprocessing.scale(NewData)

Predict the data

PredictedTarget = model.predict(NewData_Scaled)

I think I am missing a transformation function with preprocessing.scale so that I can save it with the trained model and then apply it on the new unseen data? any help please.

669

asked Aug 05 '16 02:08

S.AMEEN

1 Answers

Take a look at these docs.

You can use the StandardScaler class of the preprocessing module to remember the scaling of your training data so you can apply it to future values.

from sklearn.preprocessing import StandardScaler
X_train = np.array([[ 1., -1.,  2.],
                    [ 2.,  0.,  0.],
                    [ 0.,  1., -1.]])
scaler = StandardScaler().fit(X_train)

scaler has calculated the mean and scaling factor to standardize each feature.

>>>scaler.mean_
array([ 1. ...,  0. ...,  0.33...])
>>>scaler.scale_                                       
array([ 0.81...,  0.81...,  1.24...])

To apply it to a dataset:

import numpy as np

X_train_scaled = scaler.transform(X_train)
new_data = np.array([-1.,  1., 0.])    
new_data_scaled = scaler.transform(new_data)
>>>new_data_scaled
array([[-2.44...,  1.22..., -0.26...]])

answered Oct 18 '22 13:10

ilyas patanam

Related questions
                            
                                numpy loadtxt skip first row
                            
                                Python Peewee execute_sql() example
                            
                                Generating random correlated x and y points using Numpy
                            
                                python command line arguments in main, skip script name
                            
                                setuptools and pip: choice of minimal and complete install
                            
                                SQLAlchemy engine absolute path URL in windows
                            
                                Dynamically defining instance fields in Python classes
                            
                                Running an async background task in Tornado
                            
                                How to tell Spyder's style analysis PEP8 to read from a setup.cfg or increase max. line length?
                            
                                Additional Serializer Fields in Django REST Framework 3
                            
                                Why does a space affect the identity comparison of equal strings? [duplicate]
                            
                                Why does a Flask app create two process? [duplicate]
                            
                                Django: How to automatically change a field's value at the time mentioned in the same object?
                            
                                Get a list of values of one column from the results of a query
                            
                                Pip Install hangs
                            
                                Removing elements from pandas series in python
                            
                                PyCrypto for Python3 in Alpine?
                            
                                Read in .xlsx with csv module in python
                            
                                class diagram viewer application for python3 source
                            
                                How to mock django settings attribute used in another module?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With