Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count items greater than a value in pandas groupby

Tags:

I have the Yelp dataset and I want to count all reviews which have greater than 3 stars. I get the count of reviews by doing this:

reviews.groupby('business_id')['stars'].count()

Now I want to get the count of reviews which had more than 3 stars, so I tried this by taking inspiration from here:

reviews.groupby('business_id')['stars'].agg({'greater':lambda val: (val > 3).count()})

But this just gives me the count of all stars like before. I am not sure if this is the right way to do it? What am I doing incorrectly here. Does the lambda expression not go through each value of the stars column?

EDIT: Okay I feel stupid. I should have used the sum function instead of count to get the value of elements greater than 3, like this:

reviews.groupby('business_id')['stars'].agg({'greater':lambda val: (val > 3).sum()})
like image 201
rookie Avatar asked Nov 20 '16 23:11

rookie


People also ask

How do you count values greater than the group in pandas?

x > x. mean() gives True if the element is larger than the mean and 0 otherwise, sum then counts the number of Trues.

How do you do greater than in pandas?

Pandas DataFrame: ge() function The ge() function returns greater than or equal to of dataframe and other, element-wise. Equivalent to ==, =!, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.

How do you count values greater than 0 in pandas?

count_nonzero() function. It will return the count of True values in Series i.e. count of values greater than the given limit in the selected column.

How do you count occurrences in pandas?

Using the size() or count() method with pandas. DataFrame. groupby() will generate the count of a number of occurrences of data present in a particular column of the dataframe.


2 Answers

You can try to do :

reviews[reviews['stars'] > 3].groupby('business_id')['stars'].count()
like image 77
Mohamed AL ANI Avatar answered Oct 06 '22 00:10

Mohamed AL ANI


As I also wanted to rename the column and to run multiple functions on the same column, I came up with the following solution:

# Counting both over and under
reviews.groupby('business_id')\
       .agg(over=pandas.NamedAgg(column='stars', aggfunc=lambda x: (x > 3).sum()), 
            under=pandas.NamedAgg(column='stars', aggfunc=lambda x: (x < 3).sum()))\
       .reset_index()

The pandas.NamedAgg allows you to create multiple new columns now that the functionality was removed in newer versions of pandas.

like image 27
Esben Eickhardt Avatar answered Oct 06 '22 00:10

Esben Eickhardt