Save MinMaxScaler model in sklearn

Tags:

I'm using the MinMaxScaler model in sklearn to normalize the features of a model.

training_set = np.random.rand(4,4)*10
training_set

       [[ 6.01144787,  0.59753007,  2.0014852 ,  3.45433657],
       [ 6.03041646,  5.15589559,  6.64992437,  2.63440202],
       [ 2.27733136,  9.29927394,  0.03718093,  7.7679183 ],
       [ 9.86934288,  7.59003904,  6.02363739,  2.78294206]]


scaler = MinMaxScaler()
scaler.fit(training_set)    
scaler.transform(training_set)


   [[ 0.49184811,  0.        ,  0.29704831,  0.15972182],
   [ 0.4943466 ,  0.52384506,  1.        ,  0.        ],
   [ 0.        ,  1.        ,  0.        ,  1.        ],
   [ 1.        ,  0.80357559,  0.9052909 ,  0.02893534]]

Now I want to use the same scaler to normalize the test set:

   [[ 8.31263467,  7.99782295,  0.02031658,  9.43249727],
   [ 1.03761228,  9.53173021,  5.99539478,  4.81456067],
   [ 0.19715961,  5.97702519,  0.53347403,  5.58747666],
   [ 9.67505429,  2.76225253,  7.39944931,  8.46746594]]

But I don't want so use the scaler.fit() with the training data all the time. Is there a way to save the scaler and load it later from a different file?

232

asked Feb 02 '17 03:02

5 Answers

Update: sklearn.externals.joblib is deprecated. Install and use the pure joblib instead. Please see Engineero's answer below, which is otherwise identical to mine.

Original answer

Even better than pickle (which creates much larger files than this method), you can use sklearn's built-in tool:

from sklearn.externals import joblib
scaler_filename = "scaler.save"
joblib.dump(scaler, scaler_filename) 

# And now to load...

scaler = joblib.load(scaler_filename)

answered Oct 15 '22 13:10

Ivan Vegner

So I'm actually not an expert with this but from a bit of research and a few helpful links, I think pickle and sklearn.externals.joblib are going to be your friends here.

The package pickle lets you save models or "dump" models to a file.

I think this link is also helpful. It talks about creating a persistence model. Something that you're going to want to try is:

# could use: import pickle... however let's do something else
from sklearn.externals import joblib 

# this is more efficient than pickle for things like large numpy arrays
# ... which sklearn models often have.   

# then just 'dump' your file
joblib.dump(clf, 'my_dope_model.pkl')

Here is where you can learn more about the sklearn externals.

Let me know if that doesn't help or I'm not understanding something about your model.

Note: sklearn.externals.joblib is deprecated. Install and use the pure joblib instead

answered Oct 15 '22 13:10

jlarks32

Just a note that sklearn.externals.joblib has been deprecated and is superseded by plain old joblib, which can be installed with pip install joblib:

import joblib
joblib.dump(my_scaler, 'scaler.gz')
my_scaler = joblib.load('scaler.gz')

Note that file extensions can be anything, but if it is one of ['.z', '.gz', '.bz2', '.xz', '.lzma'] then the corresponding compression protocol will be used. Docs for joblib.dump() and joblib.load() methods.

answered Oct 15 '22 13:10

Engineero

You can use pickle, to save the scaler:

import pickle
scalerfile = 'scaler.sav'
pickle.dump(scaler, open(scalerfile, 'wb'))

Load it back:

import pickle
scalerfile = 'scaler.sav'
scaler = pickle.load(open(scalerfile, 'rb'))
test_scaled_set = scaler.transform(test_set)

answered Oct 15 '22 13:10

Psidom

The best way to do this is to create an ML pipeline like the following:

from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import MinMaxScaler
from sklearn.externals import joblib


pipeline = make_pipeline(MinMaxScaler(),YOUR_ML_MODEL() )

model = pipeline.fit(X_train, y_train)

Now you can save it to a file:

joblib.dump(model, 'filename.mod')

Later you can load it like this:

model = joblib.load('filename.mod')

answered Oct 15 '22 13:10

PSN

Related questions
                            
                                Making SVM run faster in python
                            
                                OpenCV not working properly with python on Linux with anaconda. Getting error that cv2.imshow() is not implemented
                            
                                Double Progress Bar in Python
                            
                                How to split data into trainset and testset randomly?
                            
                                Python: Pass or Sleep for long running processes?
                            
                                trying to install pymssql on ubuntu 12.04 using pip
                            
                                Python version 2.6 required, which was not found in the registry
                            
                                Profiling in Python: Who called the function?
                            
                                python tracing a segmentation fault
                            
                                Limit number of characters with Django Template filter
                            
                                add columns different length pandas
                            
                                Popen error: [Errno 2] No such file or directory
                            
                                pandas comparison raises TypeError: cannot compare a dtyped [float64] array with a scalar of type [bool]
                            
                                Python webbrowser.open() to open Chrome browser
                            
                                How to add the current query string to an URL in a Django template?
                            
                                'True' and 'False' in Python
                            
                                Escape double quotes for JSON in Python
                            
                                How do I get the value of a tensor in PyTorch?
                            
                                Stream large binary files with urllib2 to file
                            
                                Reading Unicode file data with BOM chars in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Save MinMaxScaler model in sklearn

Tags:

python

machine-learning

scikit-learn

normalization

Luis Ramon Ramirez Rodriguez

People also ask

5 Answers

Original answer

Ivan Vegner

jlarks32

Engineero

Psidom

The best way to do this is to create an ML pipeline like the following:

Now you can save it to a file:

Later you can load it like this:

PSN

Recent Activity

Donate For Us