I want to group my DataFrame by specific column and then apply a sklearn preprocessing MinMaxScaler and store the scaler object.
My at the moment starting point:
import pandas as pd
from sklearn import preprocessing
scaler = {}
groups = df.groupby('ID')
for name, group in groups:
scr = preprocessing.MinMaxScaler()
scr.fit(group)
scaler.update({name: scr})
group = scr.transform(group)
Is this possible with df.groupby('ID').transform
?
UPDATE
From my original DataFrame
pd.DataFrame( dict( ID=list('AAABBB'),
VL=(0,10,10,100,100,200))
I want to scale all columns based on ID. In this example:
A 0.0
A 1.0
A 1.0
B 0.0
B 0.0
B 1.0
with the information / scaler object (initialized with fit)
preprocessing.MinMaxScaler().fit( ... )
The “group by” process: split-apply-combine (1) Splitting the data into groups. (2). Applying a function to each group independently, (3) Combining the results into a data structure.
Step 1: split the data into groups by creating a groupby object from the original DataFrame; Step 2: apply a function, in this case, an aggregation function that computes a summary statistic (you can also transform or filter your data in this step); Step 3: combine the results into a new DataFrame.
Groupby preserves the order of rows within each group.
you can do it in one direction:
In [62]: from sklearn.preprocessing import minmax_scale
In [63]: df
Out[63]:
ID VL SC
0 A 0 0
1 A 10 1
2 A 10 1
3 B 100 0
4 B 100 0
5 B 200 1
In [64]: df['SC'] = df.groupby('ID').VL.transform(lambda x: minmax_scale(x.astype(float)))
In [65]: df
Out[65]:
ID VL SC
0 A 0 0
1 A 10 1
2 A 10 1
3 B 100 0
4 B 100 0
5 B 200 1
but you will not be anle to use inverse_transform
as each call of MinMaxScaler
(for each group or each ID
) will overwrite the information about your orginal features...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With