I am trying to return a groupby object that contains only items with 3 or more entries for the field I am grouping on. For instance, if I have the following object and I group on 'Letters'
:
Letters Numbers Items Bool
A 1 lamp 1
B 2 glass 1
B 2 table 1
C 5 pic 0
B 3 glass 0
C 4 bottle 1
C 2 horse 0
Then my code would only return a groupby object with b
I have tried the following:
old_df = a_df.groupby('Letters')
new_df = old_df.filter(len(old_df['Letters'])>2)
I've also tried with some lambda stuff mentioned here: filtering grouped df in pandas
Thanks!
So if I wanted to return a vector with the sum of all the values in Bool per letter where the number of entries per letter exceeds three, how would I do it? With the data provided, this would be [2,1] (2 for B and 1 for C)
I think you're misunderstanding the output of groupby
:
In [4]: df
Out[4]:
Letters Numbers Items Bool
0 A 1 lamp 1
1 B 2 glass 1
2 B 2 table 1
3 C 5 pic 0
4 B 3 glass 0
In [6]: df.groupby('Letters').filter(lambda x: len(x) > 2)
Out[6]:
Letters Numbers Items Bool
1 B 2 glass 1
2 B 2 table 1
4 B 3 glass 0
The object that results from a groupby
operation is an iterable object, and its "iterands" are tuple
s of the group label and the subset of the original DataFrame
corresponding to the group. For example,
In [9]: for name, subset in df.groupby('Letters'):
...: print(name)
...: print(subset)
...: print('\n')
...:
A
Letters Numbers Items Bool
0 A 1 lamp 1
B
Letters Numbers Items Bool
1 B 2 glass 1
2 B 2 table 1
4 B 3 glass 0
C
Letters Numbers Items Bool
3 C 5 pic 0
To sum the values of Bool
by Letter
where the number of members in each group is greater than 2:
In [39]: df.groupby('Letters').filter(lambda x: len(x) > 2).groupby('Letters').Bool.sum()
Out[39]:
Letters
B 2
C 1
Name: Bool, dtype: int64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With