I want to apply the scaling <code>sklearn.preprocessing.scale</code> module that <code>scikit-learn</code> offers for centering a dataset that I will use to train an svm classifier. How can I then store the standardization parameters so that I can also apply them to the data that I want to classify? I know I can use the <code>standarScaler</code> but can I somehow serialize it to a file so that I wont have to fit it to my data every time I want to run the classifier?

I think that the best way is to pickle it post <code>fit</code>, as this is the most generic option. Perhaps you'll later create a pipeline composed of both a feature extractor and scaler. By pickling a (possibly compound) stage, you're making things more generic. The sklearn documentation on model persistence discusses how to do this. Having said that, you can query <code>sklearn.preprocessing.StandardScaler</code> for the fit parameters: <blockquote> scale_ : ndarray, shape (n_features,) Per feature relative scaling of the data. New in version 0.17: scale_ is recommended instead of deprecated std_. mean_ : array of floats with shape [n_features] The mean value for each feature in the training set. </blockquote> The following short snippet illustrates this: <pre class="prettyprint"><code>from sklearn import preprocessing import numpy as np s = preprocessing.StandardScaler() s.fit(np.array([[1., 2, 3, 4]]).T) >>> s.mean_, s.scale_ (array([ 2.5]), array([ 1.11803399])) </code></pre>

Scale with standard scaler <pre class="prettyprint"><code>from sklearn.preprocessing import StandardScaler scaler = StandardScaler() scaler.fit(data) scaled_data = scaler.transform(data) </code></pre> save mean_ and var_ for later use <pre class="prettyprint"><code>means = scaler.mean_ vars = scaler.var_ </code></pre> (you can print and copy paste means and vars or save to disk with np.save....) Later use of saved parameters <pre class="prettyprint"><code>def scale_data(array,means=means,stds=vars **0.5): return (array-means)/stds scale_new_data = scale_data(new_data) </code></pre>

How to store scaling parameters for later use

2 Answers

I think that the best way is to pickle it post fit, as this is the most generic option. Perhaps you'll later create a pipeline composed of both a feature extractor and scaler. By pickling a (possibly compound) stage, you're making things more generic. The sklearn documentation on model persistence discusses how to do this.

Having said that, you can query sklearn.preprocessing.StandardScaler for the fit parameters:

scale_ : ndarray, shape (n_features,) Per feature relative scaling of the data. New in version 0.17: scale_ is recommended instead of deprecated std_. mean_ : array of floats with shape [n_features] The mean value for each feature in the training set.

The following short snippet illustrates this:

from sklearn import preprocessing
import numpy as np

s = preprocessing.StandardScaler()
s.fit(np.array([[1., 2, 3, 4]]).T)
>>> s.mean_, s.scale_
(array([ 2.5]), array([ 1.11803399]))

125

answered Oct 24 '22 19:10

Ami Tavory

Scale with standard scaler

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(data)
scaled_data = scaler.transform(data)

save mean_ and var_ for later use

means = scaler.mean_ 
vars = scaler.var_

(you can print and copy paste means and vars or save to disk with np.save....)

Later use of saved parameters

def scale_data(array,means=means,stds=vars **0.5):
    return (array-means)/stds

scale_new_data = scale_data(new_data)

answered Oct 24 '22 20:10

Ioannis Nasios

Related questions
                            
                                Sharing a lock between gunicorn workers
                            
                                Setting DataFrame column headers to a MultiIndex
                            
                                Django: list all reverse relations of a model
                            
                                remove italics in latex subscript in matplotlib
                            
                                fixing words with spaces using a dictionary look up in python?
                            
                                How to place minor ticks on symlog scale?
                            
                                Where in flask/gunicorn to initialize application
                            
                                Use cases for property vs. descriptor vs. __getattribute__
                            
                                Get count of related model efficiently in Django
                            
                                How to eliminate the extra minus sign when rounding negative numbers towards zero in numpy?
                            
                                Find out which font matplotlib uses
                            
                                Why does PyMongo throw AutoReconnect?
                            
                                Pandas MultiIndex: Divide all columns by one column
                            
                                Clustering cosine similarity matrix
                            
                                Why does CalibratedClassifierCV underperform a direct classifer?
                            
                                Merge Only When Value is Empty/Null in Pandas
                            
                                Cyclic shift of a pandas series
                            
                                Why is psycopg2 IntegrityError not being caught?
                            
                                Spline with constraints at border
                            
                                pip broken, reinstall doesn't work. EC2

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to store scaling parameters for later use

Tags:

python

scikit-learn

normalization

standardized

LetsPlayYahtzee

People also ask

2 Answers

Ami Tavory

Ioannis Nasios

Recent Activity

Donate For Us