Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting Error on StandardScalar Fit_Transform

 import numpy as np
 import matplotlib.pyplot as plt
 import pandas as pd

 dataset = pd.read_csv('Position_Salaries.csv')
 X = dataset.iloc[:, 1:2].values
 y = dataset.iloc[:, 2].values

from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
sc_y = StandardScaler()
X = sc_X.fit_transform(X)
y = sc_y.fit_transform(y)

Ok so here is the problem. both X and y are single feature and have one column. As you can see X is a matrix. and y is a vector X = dataset.iloc[:, 1:2].values y = dataset.iloc[:, 2].values

Now when I run y = sc_y.fit_transform(y) I get the error that it is a 1D array. And if i change y = dataset.iloc[:, 2:3].values making it a 2D array. But I want it to stay as 1D array since its the dependent variable and want it to stay that way. Also i solved earlier different examples where I had to rescale similar data, and it did not give me this kind of error. Not sure why it is giving me now. Moreover i am watching a video while coding and in the video everything is the same but he doesn't get any error.

like image 646
Vikas Kyatannawar Avatar asked Dec 06 '17 13:12

Vikas Kyatannawar


People also ask

What does Fit_transform mean?

In layman's terms, fit_transform means to do some calculation and then do transformation (say calculating the means of columns from some data and then replacing the missing values). So for training set, you need to both calculate and do transformation.

What is difference between fit () Transform () and Fit_transform ()?

The fit(data) method is used to compute the mean and std dev for a given feature to be used further for scaling. The transform(data) method is used to perform scaling using mean and std dev calculated using the . fit() method. The fit_transform() method does both fits and transform.

What does scaler Fit_transform do?

fit_transform() – It is used on the training data so that we can scale the training data and also learn the scaling parameters. Here, the model built will learn the mean and variance of the features of the training set. These learned parameters are then further used to scale our test data.

Should I use fit or Fit_transform?

This method performs fit and transform on the input data at a single time and converts the data points. If we use fit and transform separate when we need both then it will decrease the efficiency of the model so we use fit_transform() which will do both the work.


3 Answers

StandardScaler is meant to work on the features, not labels or target data. Hence only works on 2-d Data. Please see here for documentation:

  • http://scikit-learn.org/stable/modules/preprocessing.html#standardization-or-mean-removal-and-variance-scaling

What you can do is, use scale function. StandardScaler is just a wrapper over this function.

from sklearn.preprocessing import scale
y = scale(y)

Or if you want to use StandarScaler, you need to reshape your y to a 2-d array like this:

import numpy as np
y = np.array(y).reshape(-1,1)
y = sc_y.fit_transform(y)
like image 185
Vivek Kumar Avatar answered Oct 13 '22 08:10

Vivek Kumar


StandardScaler used to work with 1d arrays but with a DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1,1) if your data has a single feature or X.reshape(1,-1) if it contains a single sample.

So, following the solution you are looking for:

sc_y = StandardScaler()
y = np.array(y).reshape(-1,1)
y = sc_y.fit_transform(y)
y = y.flatten()
like image 25
Isaac Dm Avatar answered Oct 13 '22 08:10

Isaac Dm


from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
X = sc_X.fit_transform(X)

sc_y = StandardScaler()
y = np.array(y).reshape(-1,1)
y = sc_y.fit_transform(y)
y = y.flatten()
like image 28
kasbi rida Avatar answered Oct 13 '22 09:10

kasbi rida