Getting Error on StandardScalar Fit

 import numpy as np
 import matplotlib.pyplot as plt
 import pandas as pd

 dataset = pd.read_csv('Position_Salaries.csv')
 X = dataset.iloc[:, 1:2].values
 y = dataset.iloc[:, 2].values

from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
sc_y = StandardScaler()
X = sc_X.fit_transform(X)
y = sc_y.fit_transform(y)

Ok so here is the problem. both X and y are single feature and have one column. As you can see X is a matrix. and y is a vector X = dataset.iloc[:, 1:2].values y = dataset.iloc[:, 2].values

Now when I run y = sc_y.fit_transform(y) I get the error that it is a 1D array. And if i change y = dataset.iloc[:, 2:3].values making it a 2D array. But I want it to stay as 1D array since its the dependent variable and want it to stay that way. Also i solved earlier different examples where I had to rescale similar data, and it did not give me this kind of error. Not sure why it is giving me now. Moreover i am watching a video while coding and in the video everything is the same but he doesn't get any error.

What does Fit_transform mean?

In layman's terms, fit_transform means to do some calculation and then do transformation (say calculating the means of columns from some data and then replacing the missing values). So for training set, you need to both calculate and do transformation.

What is difference between fit () Transform () and Fit_transform ()?

The fit(data) method is used to compute the mean and std dev for a given feature to be used further for scaling. The transform(data) method is used to perform scaling using mean and std dev calculated using the . fit() method. The fit_transform() method does both fits and transform.

What does scaler Fit_transform do?

fit_transform() – It is used on the training data so that we can scale the training data and also learn the scaling parameters. Here, the model built will learn the mean and variance of the features of the training set. These learned parameters are then further used to scale our test data.

Should I use fit or Fit_transform?

This method performs fit and transform on the input data at a single time and converts the data points. If we use fit and transform separate when we need both then it will decrease the efficiency of the model so we use fit_transform() which will do both the work.

StandardScaler is meant to work on the features, not labels or target data. Hence only works on 2-d Data. Please see here for documentation:

http://scikit-learn.org/stable/modules/preprocessing.html#standardization-or-mean-removal-and-variance-scaling

What you can do is, use scale function. StandardScaler is just a wrapper over this function.

from sklearn.preprocessing import scale
y = scale(y)

Or if you want to use StandarScaler, you need to reshape your y to a 2-d array like this:

import numpy as np
y = np.array(y).reshape(-1,1)
y = sc_y.fit_transform(y)

StandardScaler used to work with 1d arrays but with a DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1,1) if your data has a single feature or X.reshape(1,-1) if it contains a single sample.

So, following the solution you are looking for:

sc_y = StandardScaler()
y = np.array(y).reshape(-1,1)
y = sc_y.fit_transform(y)
y = y.flatten()

from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
X = sc_X.fit_transform(X)

sc_y = StandardScaler()
y = np.array(y).reshape(-1,1)
y = sc_y.fit_transform(y)
y = y.flatten()

Getting Error on StandardScalar Fit_Transform

Tags:

python

arrays

machine-learning

scikit-learn

sklearn-pandas

Vikas Kyatannawar

People also ask

3 Answers

Vivek Kumar

Isaac Dm

kasbi rida

Recent Activity

Donate For Us