Scikit-learn: Predicting new raw and unscaled instance using models trained with scaled data

Tags:

I have produced different classifier models using scikit-learn and this has been smooth-sailing. Due to differences of units in the data (I got the data from different sensors labeled by their corresponding categories), I opted to scale the features using the StandardScale module.

Resulting accuracy scores of the different machine learning classifiers were fine. However, when I try to use the model to predict a raw instance (meaning unscaled) of sensor values, the models output wrong classification.

Should this really be the case because of the scaling done to the training data? If so, is there an easy way to scale the raw values too? I would like to use model persistence for this using joblib and it would be appreciated if there is a way to make this as modular as possible. Meaning to say, not to record mean and standard variation for each feature every time the training data changes.

953

asked Feb 08 '16 18:02

jc1012

1 Answers

Should this really be the case because of the scaling done to the training data?

Yes, this is expected behavior. You trained your model on scaled data, thus it will only work with scaled data.

If so, is there an easy way to scale the raw values too?

Yes, just save your scaler.

# Training
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
...
# do some training, probably save classifier, and save scaler too!

then

# Testing
# load scaler
scaled_instances = scaler.transform(raw_instances)

Meaning to say, not to record mean and standard variation for each feature every time the training data changes

This is exactly what you have to do, although not by hand (as this is what scaler computes), but essentialy "under the hood" this is what happens - you have to store means/stds for each feature.

186

answered Sep 19 '22 12:09

lejlot

Related questions
                            
                                Why is json.dumps() a must in Flask?
                            
                                Vectorized calculation of a column's value based on a previous value of the same column?
                            
                                Overcoming MemoryError / Slow Runtime in Ashton String task
                            
                                How to write this algorithm in a python code?
                            
                                how to get the value of multiple maximas in an array in python
                            
                                Python Dictionary to Pandas Dataframe
                            
                                Python: faster operation for indexing
                            
                                how to join two Pandas dataframes on one column and one index [duplicate]
                            
                                How to launch a Window's shortcut using Python
                            
                                Click wouldn't let me pass multiple files, although it should be possible
                            
                                pulp.solvers.PulpSolverError: PuLP: cannot execute glpsol.exe
                            
                                Wait until alert is not present - Selenium/Python
                            
                                Selecting distinct values from a column in Peewee
                            
                                Python: converting tuple into 2D array
                            
                                OS X Uninstall a distutils installed project
                            
                                Context manager inside a class instance
                            
                                pip3 install pyautogui fails with error code 1 Mac OS
                            
                                PyPDF2 won't extract all text from PDF
                            
                                Generating Large Prime Numbers with Py Crypto
                            
                                OpenCV Python, reading video from named pipe

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Scikit-learn: Predicting new raw and unscaled instance using models trained with scaled data

Tags:

python

machine-learning

scikit-learn

jc1012

People also ask

1 Answers

lejlot

Recent Activity

Donate For Us