Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Scaling numbers column by column with pandas

Tags:

python

pandas

I have a Pandas data frame 'df' in which I'd like to perform some scalings column by column.

  • In column 'a', I need the maximum number to be 1, the minimum number to be 0, and all other to be spread accordingly.
  • In column 'b', however, I need the minimum number to be 1, the maximum number to be 0, and all other to be spread accordingly.

Is there a Pandas function to perform these two operations? If not, numpy would certainly do.

    a    b A   14   103 B   90   107 C   90   110 D   96   114 E   91   114 
like image 304
Lucien S. Avatar asked Feb 13 '14 20:02

Lucien S.


People also ask

How do you scale data in Pandas Python?

Using The min-max feature scaling The min-max approach (often called normalization) rescales the feature to a hard and fast range of [0,1] by subtracting the minimum value of the feature then dividing by the range. We can apply the min-max scaling in Pandas using the . min() and . max() methods.

How do I normalize a column in Pandas?

2. Pandas Normalize Using Mean Normalization. To normalize all columns of pandas DataFrame, we simply subtract the mean and divide by standard deviation.


2 Answers

This is how you can do it using sklearn and the preprocessing module. Sci-Kit Learn has many pre-processing functions for scaling and centering data.

In [0]: from sklearn.preprocessing import MinMaxScaler  In [1]: df = pd.DataFrame({'A':[14,90,90,96,91],                            'B':[103,107,110,114,114]}).astype(float)  In [2]: df Out[2]:     A    B 0  14  103 1  90  107 2  90  110 3  96  114 4  91  114  In [3]: scaler = MinMaxScaler()  In [4]: df_scaled = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)  In [5]: df_scaled Out[5]:           A         B 0  0.000000  0.000000 1  0.926829  0.363636 2  0.926829  0.636364 3  1.000000  1.000000 4  0.939024  1.000000 
like image 172
Zelazny7 Avatar answered Sep 21 '22 08:09

Zelazny7


You could subtract by the min, then divide by the max (beware 0/0). Note that after subtracting the min, the new max is the original max - min.

In [11]: df Out[11]:     a    b A  14  103 B  90  107 C  90  110 D  96  114 E  91  114  In [12]: df -= df.min()  # equivalent to df = df - df.min()  In [13]: df /= df.max()  # equivalent to df = df / df.max()  In [14]: df Out[14]:           a         b A  0.000000  0.000000 B  0.926829  0.363636 C  0.926829  0.636364 D  1.000000  1.000000 E  0.939024  1.000000 

To switch the order of a column (from 1 to 0 rather than 0 to 1):

In [15]: df['b'] = 1 - df['b'] 

An alternative method is to negate the b columns first (df['b'] = -df['b']).

like image 31
Andy Hayden Avatar answered Sep 24 '22 08:09

Andy Hayden