When train a model, say linear regression, we may make a normalization, like MinMaxScaler, on the train an test dataset.
After we got a trained model and use it to make predictions, and scale back the predictions to the original representation.
In python, there is "inverse_transform" method. For example:
from sklearn.preprocessing import MinMaxScaler
scalerModel.inverse_transform
from sklearn.preprocessing import MinMaxScaler
data = [[-1, 2], [-0.5, 6], [0, 10], [1, 18]]
scaler = MinMaxScaler()
MinMaxScaler(copy=True, feature_range=(0, 1))
print(data)
dataScaled = scaler.fit(data).transform(data)
print(dataScaled)
scaler.inverse_transform(dataScaled)
Is there similar method in spark?
I have googled a lot, but found no answer. Can anyone give me some suggestions? Thank you very much!
Every time you make a model.predict (X) it will apply an inverse transformation so that your predictions are in the same scale as prior MinMaxScaler Thanks for contributing an answer to Data Science Stack Exchange!
MinMax Scaler shrinks the data within the given range, usually of 0 to 1. It transforms data by scaling features to a given range. It scales the values to a specific value range without changing the shape of the original distribution. The MinMax scaling is done using: Sklearn preprocessing defines MinMaxScaler () method to achieve this.
MinMax Scaler There is another way of data scaling, where the minimum of feature is made equal to zero and the maximum of feature equal to one. MinMax Scaler shrinks the data within the given range, usually of 0 to 1. It transforms data by scaling features to a given range.
First, a MinMaxScaler instance is defined with default hyperparameters. Once defined, we can call the fit_transform () function and pass it to our dataset to create a transformed version of our dataset.
In our company, in order to solve the same problem on the StandardScaler, we extended spark.ml with this (among other things):
package org.apache.spark.ml
import org.apache.spark.ml.linalg.DenseVector
import org.apache.spark.ml.util.Identifiable
package object feature {
implicit class RichStandardScalerModel(model: StandardScalerModel) {
private def invertedStdDev(sigma: Double): Double = 1 / sigma
private def invertedMean(mu: Double, sigma: Double): Double = -mu / sigma
def inverse(newOutputCol: String): StandardScalerModel = {
val sigma: linalg.Vector = model.std
val mu: linalg.Vector = model.mean
val newSigma: linalg.Vector = new DenseVector(sigma.toArray.map(invertedStdDev))
val newMu: linalg.Vector = new DenseVector(mu.toArray.zip(sigma.toArray).map { case (m, s) => invertedMean(m, s) })
val inverted: StandardScalerModel = new StandardScalerModel(Identifiable.randomUID("stdScal"), newSigma, newMu)
.setInputCol(model.getOutputCol)
.setOutputCol(newOutputCol)
inverted
.set(inverted.withMean, model.getWithMean)
.set(inverted.withStd, model.getWithStd)
}
}
}
It should be fairly easy to modify it or do something similar for your specific case.
Keep in mind that due to JVM's double implementation, you normally lose precision in these operations, so you will not recover the exact original values you had before the transformation (e.g.: you will probably get something like 1.9999999999999998 instead of 2.0).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With