I have produced different classifier models using scikit-learn and this has been smooth-sailing. Due to differences of units in the data (I got the data from different sensors labeled by their corresponding categories), I opted to scale the features using the StandardScale module.
Resulting accuracy scores of the different machine learning classifiers were fine. However, when I try to use the model to predict a raw instance (meaning unscaled) of sensor values, the models output wrong classification.
Should this really be the case because of the scaling done to the training data? If so, is there an easy way to scale the raw values too? I would like to use model persistence for this using joblib and it would be appreciated if there is a way to make this as modular as possible. Meaning to say, not to record mean and standard variation for each feature every time the training data changes.
To give inputs to a machine learning model, you have to create a NumPy array, where you have to input the values of the features you used to train your machine learning model. Then we can use that array in the model. predict() method, and at the end, it will give the predicted value as an output based on the inputs.
The Sklearn 'Predict' Method Predicts an OutputThat being the case, it provides a set of tools for doing things like training and evaluating machine learning models. What is this? And it also has tools to predict an output value, once the model is trained (for ML techniques that actually make predictions).
Linear discriminant analysis, as you may be able to guess, is a linear classification algorithm and best used when the data has a linear relationship.
Should this really be the case because of the scaling done to the training data?
Yes, this is expected behavior. You trained your model on scaled data, thus it will only work with scaled data.
If so, is there an easy way to scale the raw values too?
Yes, just save your scaler.
# Training
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
...
# do some training, probably save classifier, and save scaler too!
then
# Testing
# load scaler
scaled_instances = scaler.transform(raw_instances)
Meaning to say, not to record mean and standard variation for each feature every time the training data changes
This is exactly what you have to do, although not by hand (as this is what scaler computes), but essentialy "under the hood" this is what happens - you have to store means/stds for each feature.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With