Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas groupby in combination with sklearn preprocessing

Tags:

pandas

scipy

I want to group my DataFrame by specific column and then apply a sklearn preprocessing MinMaxScaler and store the scaler object.

My at the moment starting point:

import pandas as pd
from sklearn import preprocessing

scaler = {}
groups = df.groupby('ID')

for name, group in groups:
  scr = preprocessing.MinMaxScaler()
  scr.fit(group)
  scaler.update({name: scr})
  group = scr.transform(group)

Is this possible with df.groupby('ID').transform ?

UPDATE

From my original DataFrame

pd.DataFrame( dict( ID=list('AAABBB'),
                    VL=(0,10,10,100,100,200))

I want to scale all columns based on ID. In this example:

   A 0.0
   A 1.0
   A 1.0
   B 0.0
   B 0.0
   B 1.0

with the information / scaler object (initialized with fit)

preprocessing.MinMaxScaler().fit( ... )
like image 863
Roby Avatar asked Mar 13 '17 20:03

Roby


People also ask

What are the three phases of the pandas groupby () function?

The “group by” process: split-apply-combine (1) Splitting the data into groups. (2). Applying a function to each group independently, (3) Combining the results into a data structure.

How do I merge groupby in pandas?

Step 1: split the data into groups by creating a groupby object from the original DataFrame; Step 2: apply a function, in this case, an aggregation function that computes a summary statistic (you can also transform or filter your data in this step); Step 3: combine the results into a new DataFrame.

Does pandas groupby preserve order?

Groupby preserves the order of rows within each group.


1 Answers

you can do it in one direction:

In [62]: from sklearn.preprocessing import minmax_scale

In [63]: df
Out[63]:
  ID   VL  SC
0  A    0   0
1  A   10   1
2  A   10   1
3  B  100   0
4  B  100   0
5  B  200   1

In [64]: df['SC'] = df.groupby('ID').VL.transform(lambda x: minmax_scale(x.astype(float)))

In [65]: df
Out[65]:
  ID   VL  SC
0  A    0   0
1  A   10   1
2  A   10   1
3  B  100   0
4  B  100   0
5  B  200   1

but you will not be anle to use inverse_transform as each call of MinMaxScaler (for each group or each ID) will overwrite the information about your orginal features...

like image 160
MaxU - stop WAR against UA Avatar answered Oct 02 '22 00:10

MaxU - stop WAR against UA