So I have a df like this:
NAME TRY SCORE
Bob 1st 3
Sue 1st 7
Tom 1st 3
Max 1st 8
Jay 1st 4
Mel 1st 7
Bob 2nd 4
Sue 2nd 2
Tom 2nd 6
Max 2nd 4
Jay 2nd 7
Mel 2nd 8
Bob 3rd 3
Sue 3rd 5
Tom 3rd 6
Max 3rd 3
Jay 3rd 4
Mel 3rd 6
I want to count haw mant times each person scores more than 5?
into a new df2 that looks like this:
NAME COUNT
Bob 0
Sue 1
Tom 2
Mary 1
Jay 1
Mel 3
My attempts have been many - here is the latest
df2 = df.groupby('NAME')[['SCORE'] > 5].count().reset_index(name="count")
df1 will be Groupby single column – groupby count pandas python: groupby () function takes up the column name as argument followed by count () function as shown below 1
Groupby count in pandas dataframe python Groupby count in pandas python can be accomplished by groupby() function. Groupby count of multiple column and single column in pandas is accomplished by multiple ways some among them are groupby() function and aggregate() function.
Pandas groupby () and using agg (‘count’) Alternatively, you can also get the group count by using agg () or aggregate () function and passing the aggregate count function as a param. reset_index () function is used to set the index on DataFrame. By using this approach you can compute multiple aggregations. Yields below output. 7.
Similar to the SQL GROUP BY statement, the Pandas method works by splitting our data, aggregating it in a given way (or ways), and re-combining the data in a meaningful way. Because the .groupby () method works by first splitting the data, we can actually work with the groups directly.
Just using groupby
and sum
df.assign(SCORE=df.SCORE.gt(5)).groupby('NAME')['SCORE'].sum().astype(int).reset_index()
Out[524]:
NAME SCORE
0 Bob 0
1 Jay 1
2 Max 1
3 Mel 3
4 Sue 1
5 Tom 2
Or we using set_index
with sum
df.set_index('NAME').SCORE.gt(5).sum(level=0).astype(int)
First create boolean mask and then aggregate
by sum
- True
s values are processes like 1
:
df2 = (df['SCORE'] > 5).groupby(df['NAME']).sum().astype(int).reset_index(name="count")
print (df2)
NAME count
0 Bob 0
1 Jay 1
2 Max 1
3 Mel 3
4 Sue 1
5 Tom 2
Detail:
print (df['SCORE'] > 5)
0 False
1 True
2 False
3 True
4 False
5 True
6 False
7 False
8 True
9 False
10 True
11 True
12 False
13 False
14 True
15 False
16 False
17 True
Name: SCORE, dtype: bool
One way to do this is to write a custom groupby function where you take the scores of each group and sum up those that are greater than 5 like this:
df.groupby('NAME')['SCORE'].agg(lambda x: (x > 5).sum())
NAME
Bob 0
Jay 1
Max 1
Mel 3
Sue 1
Tom 2
Name: SCORE, dtype: int64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With