Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apply MinMaxScaler() on a pandas column

I am trying to use the sklearn MinMaxScaler to rescale a python column like below:

scaler = MinMaxScaler()
y = scaler.fit(df['total_amount'])

But got the following errors:

Traceback (most recent call last):
  File "/Users/edamame/workspace/git/my-analysis/experiments/my_seq.py", line 54, in <module>
    y = scaler.fit(df['total_amount'])
  File "/Users/edamame/workspace/git/my-analysis/venv/lib/python3.4/site-packages/sklearn/preprocessing/data.py", line 308, in fit
    return self.partial_fit(X, y)
  File "/Users/edamame/workspace/git/my-analysis/venv/lib/python3.4/site-packages/sklearn/preprocessing/data.py", line 334, in partial_fit
    estimator=self, dtype=FLOAT_DTYPES)
  File "/Users/edamame/workspace/git/my-analysis/venv/lib/python3.4/site-packages/sklearn/utils/validation.py", line 441, in check_array
    "if it contains a single sample.".format(array))
ValueError: Expected 2D array, got 1D array instead:
array=[3.180000e+00 2.937450e+03 6.023850e+03 2.216292e+04 1.074589e+04
   :
 0.000000e+00 0.000000e+00 9.000000e+01 1.260000e+03].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

Any idea what was wrong?

like image 806
Edamame Avatar asked Aug 01 '18 22:08

Edamame


People also ask

How do I normalize a column in Pandas?

2. Pandas Normalize Using Mean Normalization. To normalize all columns of pandas DataFrame, we simply subtract the mean and divide by standard deviation.


1 Answers

The input to MinMaxScaler needs to be array-like, with shape [n_samples, n_features]. So you can apply it on the column as a dataframe rather than a series (using double square brackets instead of single):

y = scaler.fit(df[['total_amount']])

Though from your description, it sounds like you want fit_transform rather than just fit (but I could be wrong):

y = scaler.fit_transform(df[['total_amount']])

A little more explanation:

If your dataframe had 100 rows, consider the difference in shape when you transform a column to an array:

>>> np.array(df[['total_amount']]).shape
(100, 1)

>>> np.array(df['total_amount']).shape
(100,)

The first returns a shape that matches [n_samples, n_features] (as required by MinMaxScaler), whereas the second does not.

like image 182
sacuL Avatar answered Oct 11 '22 00:10

sacuL