Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

groupby a column and count items above 5 in another pandas

So I have a df like this:

NAME    TRY SCORE  
Bob   1st   3  
Sue   1st   7  
Tom   1st   3  
Max   1st   8  
Jay   1st   4  
Mel   1st   7  
Bob   2nd   4  
Sue   2nd   2  
Tom   2nd   6  
Max   2nd   4  
Jay   2nd   7  
Mel   2nd   8  
Bob   3rd   3  
Sue   3rd   5  
Tom   3rd   6  
Max   3rd   3  
Jay   3rd   4  
Mel   3rd   6 

I want to count haw mant times each person scores more than 5?
into a new df2 that looks like this:

NAME    COUNT  
Bob     0  
Sue     1  
Tom     2  
Mary    1  
Jay     1  
Mel     3  

My attempts have been many - here is the latest

df2 = df.groupby('NAME')[['SCORE'] > 5].count().reset_index(name="count")
like image 439
DLB Avatar asked Jun 06 '18 14:06

DLB


People also ask

How to group by single column in pandas python?

df1 will be Groupby single column – groupby count pandas python: groupby () function takes up the column name as argument followed by count () function as shown below 1

How do you do groupby count in a Dataframe in Python?

Groupby count in pandas dataframe python Groupby count in pandas python can be accomplished by groupby() function. Groupby count of multiple column and single column in pandas is accomplished by multiple ways some among them are groupby() function and aggregate() function.

How do I get the Count of a pandas group?

Pandas groupby () and using agg (‘count’) Alternatively, you can also get the group count by using agg () or aggregate () function and passing the aggregate count function as a param. reset_index () function is used to set the index on DataFrame. By using this approach you can compute multiple aggregations. Yields below output. 7.

How does groupby work in pandas?

Similar to the SQL GROUP BY statement, the Pandas method works by splitting our data, aggregating it in a given way (or ways), and re-combining the data in a meaningful way. Because the .groupby () method works by first splitting the data, we can actually work with the groups directly.


3 Answers

Just using groupby and sum

df.assign(SCORE=df.SCORE.gt(5)).groupby('NAME')['SCORE'].sum().astype(int).reset_index()
Out[524]: 
  NAME  SCORE
0  Bob      0
1  Jay      1
2  Max      1
3  Mel      3
4  Sue      1
5  Tom      2

Or we using set_index with sum

df.set_index('NAME').SCORE.gt(5).sum(level=0).astype(int)
like image 68
BENY Avatar answered Oct 22 '22 06:10

BENY


First create boolean mask and then aggregate by sum- Trues values are processes like 1:

df2 = (df['SCORE'] > 5).groupby(df['NAME']).sum().astype(int).reset_index(name="count")
print (df2)
  NAME  count
0  Bob      0
1  Jay      1
2  Max      1
3  Mel      3
4  Sue      1
5  Tom      2

Detail:

print (df['SCORE'] > 5)

0     False
1      True
2     False
3      True
4     False
5      True
6     False
7     False
8      True
9     False
10     True
11     True
12    False
13    False
14     True
15    False
16    False
17     True
Name: SCORE, dtype: bool
like image 2
jezrael Avatar answered Oct 22 '22 06:10

jezrael


One way to do this is to write a custom groupby function where you take the scores of each group and sum up those that are greater than 5 like this:

df.groupby('NAME')['SCORE'].agg(lambda x: (x > 5).sum())


NAME
Bob    0
Jay    1
Max    1
Mel    3
Sue    1
Tom    2
Name: SCORE, dtype: int64
like image 1
Ted Petrou Avatar answered Oct 22 '22 07:10

Ted Petrou