Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to not standarize target data in scikit learn regression

I am trying to predict future profit data in a dataset of a copper mine enterprise data in csv format.

I read the data:

data = pd.read_csv('data.csv')

I split the data:

data_target = data[target].astype(float)
data_used = data.drop(['Periodo', 'utilidad_operativa_dolar'], axis=1)
x_train, x_test, y_train, y_test = train_test_split(data_used, data_target, test_size=0.4,random_state=33)

Create an svr predictor:

clf_svr= svm.SVR(kernel='rbf')

Standarize the data:

from sklearn.preprocessing import StandardScaler
scalerX = StandardScaler().fit(x_train)
scalery = StandardScaler().fit(y_train)

x_train = scalerX.transform(x_train)
y_train = scalery.transform(y_train)
x_test = scalerX.transform(x_test)
y_test = scalery.transform(y_test)

print np.max(x_train), np.min(x_train), np.mean(x_train), np.max(y_train), np.min(y_train), np.mean(y_train)

Then predict:

y_pred=clf.predict(x_test)

And the prediction data is standarized as well. I want the predicted data to be in the original format, how i can do that?

like image 867
Pedro Muñoz Avatar asked Oct 27 '14 09:10

Pedro Muñoz


2 Answers

You would want to use the inverse_transform method of your y-scaler. Note that you can do all this more concisely using a pipeline, as follows

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVR

pipeline = Pipeline([('scaler', StandardScaler()), ('estimator', SVR(kernel="rbf"))])

y_scaler = StandardScaler()
y_train = y_scaler.fit_transform(y_train)
pipeline.fit(x_train, y_train)
y_pred = y_scaler.inverse_transform(pipeline.predict(x_test))

Many would just scale the target globally and get away without too much overfitting. But you are doing good in not falling for this. AFAIK using a separate scaler for y data as shown in the code is the only way to go.

like image 129
eickenberg Avatar answered Sep 26 '22 05:09

eickenberg


I know this question is old and the answer was correct at the time, but there is a scikit-learn method of doing this now.

http://scikit-learn.org/dev/modules/compose.html#transforming-target-in-regression

like image 36
Olivier Avatar answered Sep 23 '22 05:09

Olivier