Is there a way to slice a DataFrameGroupBy object?
For example, if I have:
df = pd.DataFrame({'A': [2, 1, 1, 3, 3], 'B': ['x', 'y', 'z', 'r', 'p']})
A B
0 2 x
1 1 y
2 1 z
3 3 r
4 3 p
dfg = df.groupby('A')
Now, the returned GroupBy object is indexed by values from A, and I would like to select a subset of it, e.g. to perform aggregation. It could be something like
dfg.loc[1:2].agg(...)
or, for a specific column,
dfg['B'].loc[1:2].agg(...)
EDIT. To make it more clear: by slicing the GroupBy object I mean accessing only a subset of groups. In the above example, the GroupBy object will contain 3 groups, for A = 1, A = 2, and A = 3. For some reasons, I may only be interested in groups for A = 1 and A = 2.
It seesm you need custom function with iloc
- but if use agg
is necessary return aggregate value:
df = df.groupby('A')['B'].agg(lambda x: ','.join(x.iloc[0:3]))
print (df)
A
1 y,z
2 x
3 r,p
Name: B, dtype: object
df = df.groupby('A')['B'].agg(lambda x: ','.join(x.iloc[1:3]))
print (df)
A
1 z
2
3 p
Name: B, dtype: object
For multiple columns:
df = pd.DataFrame({'A': [2, 1, 1, 3, 3],
'B': ['x', 'y', 'z', 'r', 'p'],
'C': ['g', 'y', 'y', 'u', 'k']})
print (df)
A B C
0 2 x g
1 1 y y
2 1 z y
3 3 r u
4 3 p k
df = df.groupby('A').agg(lambda x: ','.join(x.iloc[1:3]))
print (df)
B C
A
1 z y
2
3 p k
You can slice with apply like this:
if you want to slice [1:3]
from each group
n [53]: df
Out[53]:
A B
0 2 x
1 1 y
2 1 z
3 3 r
4 3 p
In [54]: dfg = df.groupby('A')
In [56]: dfg.apply(lambda x: x.loc[1:3])
Out[56]:
A B
A
1 1 1 y
2 1 z
3 3 3 r
if you want to slice only a column (B
for example)
In [55]: dfg.apply(lambda x: x['B'].loc[1:3])
Out[55]:
A
1 1 y
2 z
3 3 r
Name: B, dtype: object
Then, to aggregate, you just chain the call like this:
dfg.apply(lambda x: x['B'].loc[1:3]).agg(...)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With