I have a Pandas data frame 'df' in which I'd like to perform some scalings column by column.
Is there a Pandas function to perform these two operations? If not, numpy would certainly do.
a b A 14 103 B 90 107 C 90 110 D 96 114 E 91 114
Using The min-max feature scaling The min-max approach (often called normalization) rescales the feature to a hard and fast range of [0,1] by subtracting the minimum value of the feature then dividing by the range. We can apply the min-max scaling in Pandas using the . min() and . max() methods.
2. Pandas Normalize Using Mean Normalization. To normalize all columns of pandas DataFrame, we simply subtract the mean and divide by standard deviation.
This is how you can do it using sklearn
and the preprocessing
module. Sci-Kit Learn has many pre-processing functions for scaling and centering data.
In [0]: from sklearn.preprocessing import MinMaxScaler In [1]: df = pd.DataFrame({'A':[14,90,90,96,91], 'B':[103,107,110,114,114]}).astype(float) In [2]: df Out[2]: A B 0 14 103 1 90 107 2 90 110 3 96 114 4 91 114 In [3]: scaler = MinMaxScaler() In [4]: df_scaled = pd.DataFrame(scaler.fit_transform(df), columns=df.columns) In [5]: df_scaled Out[5]: A B 0 0.000000 0.000000 1 0.926829 0.363636 2 0.926829 0.636364 3 1.000000 1.000000 4 0.939024 1.000000
You could subtract by the min, then divide by the max (beware 0/0). Note that after subtracting the min, the new max is the original max - min.
In [11]: df Out[11]: a b A 14 103 B 90 107 C 90 110 D 96 114 E 91 114 In [12]: df -= df.min() # equivalent to df = df - df.min() In [13]: df /= df.max() # equivalent to df = df / df.max() In [14]: df Out[14]: a b A 0.000000 0.000000 B 0.926829 0.363636 C 0.926829 0.636364 D 1.000000 1.000000 E 0.939024 1.000000
To switch the order of a column (from 1 to 0 rather than 0 to 1):
In [15]: df['b'] = 1 - df['b']
An alternative method is to negate the b columns first (df['b'] = -df['b']
).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With