I have a pandas dataframe that looks like this:
c y
0 9 0
1 8 0
2 3 1
3 6 2
4 1 3
5 2 3
6 5 3
7 4 4
8 0 4
9 7 4
I'd like to groupby y
and get the min and max of c
so that my new dataframe would look like this:
c y min max
0 9 0 8 9
1 8 0 8 9
2 3 1 3 3
3 6 2 6 6
4 1 3 1 5
5 2 3 1 5
6 5 3 1 5
7 4 4 0 7
8 0 4 0 7
9 7 4 0 7
I tried using df['min'] = df.groupby(['y'])['c'].min()
but that gave me some weird results. The first 175 rows were populated in the min column but then it went to NaN for all the rest. is that not how you're supposed to use the groupby method?
Option 1 Use transform
In [13]: dfc = df.groupby('y')['c']
In [14]: df.assign(min=dfc.transform(min), max=dfc.transform(max))
Out[14]:
c y max min
0 9 0 9 8
1 8 0 9 8
2 3 1 3 3
3 6 2 6 6
4 1 3 5 1
5 2 3 5 1
6 5 3 5 1
7 4 4 7 0
8 0 4 7 0
9 7 4 7 0
Or
In [15]: df['min' ] = dfc.transform('min')
In [16]: df['max' ] = dfc.transform('max')
Option 2 Use join and agg
In [30]: df.join(df.groupby('y')['c'].agg(['min', 'max']), on='y')
Out[30]:
c y min max
0 9 0 8 9
1 8 0 8 9
2 3 1 3 3
3 6 2 6 6
4 1 3 1 5
5 2 3 1 5
6 5 3 1 5
7 4 4 0 7
8 0 4 0 7
9 7 4 0 7
Option 3 Use merge and agg
In [28]: df.merge(df.groupby('y')['c'].agg(['min', 'max']), right_index=True, left_on='y')
Out[28]:
c y min max
0 9 0 8 9
1 8 0 8 9
2 3 1 3 3
3 6 2 6 6
4 1 3 1 5
5 2 3 1 5
6 5 3 1 5
7 4 4 0 7
8 0 4 0 7
9 7 4 0 7
With Numpy shenanigans
n = df.y.max() + 1
omax = np.ones(n, df.c.values.dtype) * df.c.values.min()
omin = np.ones(n, df.c.values.dtype) * df.c.values.max()
np.maximum.at(omax, df.y.values, df.c.values)
np.minimum.at(omin, df.y.values, df.c.values)
df.assign(min=omin[df.y], max=omax[df.y])
c y min max
0 9 0 8 9
1 8 0 8 9
2 3 1 3 3
3 6 2 6 6
4 1 3 1 5
5 2 3 1 5
6 5 3 1 5
7 4 4 0 7
8 0 4 0 7
9 7 4 0 7
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With