Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

scikit-learn: how to scale back the 'y' predicted result

I'm trying to learn scikit-learn and Machine Learning by using the Boston Housing Data Set.

# I splitted the initial dataset ('housing_X' and 'housing_y')
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(housing_X, housing_y, test_size=0.25, random_state=33)

# I scaled those two datasets
from sklearn.preprocessing import StandardScaler
scalerX = StandardScaler().fit(X_train)
scalery = StandardScaler().fit(y_train)
X_train = scalerX.transform(X_train)
y_train = scalery.transform(y_train)
X_test = scalerX.transform(X_test)
y_test = scalery.transform(y_test)

# I created the model
from sklearn import linear_model
clf_sgd = linear_model.SGDRegressor(loss='squared_loss', penalty=None, random_state=42) 
train_and_evaluate(clf_sgd,X_train,y_train)

Based on this new model clf_sgd, I am trying to predict the y based on the first instance of X_train.

X_new_scaled = X_train[0]
print (X_new_scaled)
y_new = clf_sgd.predict(X_new_scaled)
print (y_new)

However, the result is quite odd for me (1.34032174, instead of 20-30, the range of the price of the houses)

[-0.32076092  0.35553428 -1.00966618 -0.28784917  0.87716097  1.28834383
  0.4759489  -0.83034371 -0.47659648 -0.81061061 -2.49222645  0.35062335
 -0.39859013]
[ 1.34032174]

I guess that this 1.34032174 value should be scaled back, but I am trying to figure out how to do it with no success. Any tip is welcome. Thank you very much.

like image 706
Hookstark Avatar asked Jun 27 '16 16:06

Hookstark


People also ask

What does predict () function of Sklearn do?

The Sklearn 'Predict' Method Predicts an OutputThat being the case, it provides a set of tools for doing things like training and evaluating machine learning models. What is this? And it also has tools to predict an output value, once the model is trained (for ML techniques that actually make predictions).

What is Fit_transform in Sklearn?

fit_transform() is used on the training data so that we can scale the training data and also learn the scaling parameters of that data. Here, the model built by us will learn the mean and variance of the features of the training set. These learned parameters are then used to scale our test data.

What does Fit_transform return?

fit_transform() then it will calculate the mean(μ) and standard deviation(σ) of the feature F at a time it will transform the data points of the feature F.


2 Answers

You can use inverse_transform using your scalery object:

y_new_inverse = scalery.inverse_transform(y_new)
like image 151
Ryan Avatar answered Oct 26 '22 23:10

Ryan


Bit late to the game: Just don't scale your y. With scaling y you actually loose your units. The regression or loss optimization is actually determined by the relative differences between the features. BTW for house prices (or any other monetary value) it is common practice to take the logarithm. Then you obviously need to do an numpy.exp() to get back to the actual dollars/euros/yens...

like image 29
Maartenk Avatar answered Oct 26 '22 23:10

Maartenk