Apply MinMaxScaler() on a pandas column

Tags:

I am trying to use the sklearn MinMaxScaler to rescale a python column like below:

scaler = MinMaxScaler()
y = scaler.fit(df['total_amount'])

But got the following errors:

Traceback (most recent call last):
  File "/Users/edamame/workspace/git/my-analysis/experiments/my_seq.py", line 54, in <module>
    y = scaler.fit(df['total_amount'])
  File "/Users/edamame/workspace/git/my-analysis/venv/lib/python3.4/site-packages/sklearn/preprocessing/data.py", line 308, in fit
    return self.partial_fit(X, y)
  File "/Users/edamame/workspace/git/my-analysis/venv/lib/python3.4/site-packages/sklearn/preprocessing/data.py", line 334, in partial_fit
    estimator=self, dtype=FLOAT_DTYPES)
  File "/Users/edamame/workspace/git/my-analysis/venv/lib/python3.4/site-packages/sklearn/utils/validation.py", line 441, in check_array
    "if it contains a single sample.".format(array))
ValueError: Expected 2D array, got 1D array instead:
array=[3.180000e+00 2.937450e+03 6.023850e+03 2.216292e+04 1.074589e+04
   :
 0.000000e+00 0.000000e+00 9.000000e+01 1.260000e+03].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

Any idea what was wrong?

806

asked Aug 01 '18 22:08

Edamame

1 Answers

The input to MinMaxScaler needs to be array-like, with shape [n_samples, n_features]. So you can apply it on the column as a dataframe rather than a series (using double square brackets instead of single):

y = scaler.fit(df[['total_amount']])

Though from your description, it sounds like you want fit_transform rather than just fit (but I could be wrong):

y = scaler.fit_transform(df[['total_amount']])

A little more explanation:

If your dataframe had 100 rows, consider the difference in shape when you transform a column to an array:

>>> np.array(df[['total_amount']]).shape
(100, 1)

>>> np.array(df['total_amount']).shape
(100,)

The first returns a shape that matches [n_samples, n_features] (as required by MinMaxScaler), whereas the second does not.

182

answered Oct 11 '22 00:10

sacuL

Related questions
                            
                                Bokeh Server callback from tools
                            
                                How to add string to all values in a column of pandas DataFrame
                            
                                Can't import plotly.figure_factory
                            
                                scrapy passing custom_settings to spider from script using CrawlerProcess.crawl()
                            
                                How can I pass keyword arguments as parameters to a function?
                            
                                Give a radio button a default value in tkinter python
                            
                                Multiple select in wagtail admin
                            
                                O(n) solution for finding maximum sum of differences python 3.x?
                            
                                Why isn't __instancecheck__ being called?
                            
                                Python sys.excepthook on multiprocess
                            
                                how to get webbrowser module for python 3.6 using pip ?
                            
                                Python 3: How to upload a pandas dataframe as a csv stream without saving on disc?
                            
                                Python closures with generator
                            
                                Rendering a tree in python using anytree and graphviz, without merging common nodes
                            
                                Cython: when should I define a string as char*, str, or bytes?
                            
                                Iterable using yield or __next__()
                            
                                How to open command prompt in Atom editor?
                            
                                Import error: No module named 'scipy._lib'
                            
                                How do I import all functions from a package in python?
                            
                                Py4JJavaError: An error occurred while calling o26.parquet. (Reading Parquet file)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Apply MinMaxScaler() on a pandas column

Tags:

python-3.x

pandas

scikit-learn

Edamame

People also ask

1 Answers

sacuL

Recent Activity

Donate For Us