Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select rows with highest value from groupby

Tags:

python

pandas

I have a dataframe that contains some information about users. There is a column for user, column for type, and column for count, like this:

name         type     count
robert       x        123
robert       y        456
robert       z        5123
charlie      x        442123
charlie      y        0 
charlie      z        42

I'm trying to figure out which type has the highest count per name, so for this case, I would want to select this:

name         type    count
robert       z       5123
charlie      x       442123

I know I can do something like this to get the max count per name, but I'm not sure how I can include the "type" column, which is actually the most important

df.sort_values('count', ascending=False).drop_duplicates('name').sort_index()

Any help is greatly appreciated!

like image 687
Ryan Black Avatar asked Dec 18 '18 22:12

Ryan Black


People also ask

How do you select rows in Groupby?

You can group DataFrame rows into a list by using pandas. DataFrame. groupby() function on the column of interest, select the column you want as a list from group and then use Series. apply(list) to get the list for every group.

How do you get the maximum values of each group in a Pandas?

To get the maximum value of each group, you can directly apply the pandas max() function to the selected column(s) from the result of pandas groupby.

How do I get the highest row value in SQL?

How to get record with max value using SQL subquery. Here's the SQL query to get rows with max sale value using SQL subquery. In the above query, we first select the max value for table in subquery (in bold). Then we select those rows from original sales table where sale column value is max value.

What does Groupby Max do?

max. Compute max of group values. Include only float, int, boolean columns.

How to get all rows with max value in a group?

Easy solution would be to apply : idxmax () function to get indices of rows with max values. This would filter out all the rows with max value in the group.

How to find the max value by group in a Dataframe?

Often you may be interested in finding the max value by group in a pandas DataFrame. Fortunately this is easy to do using the groupby () and max () functions with the following syntax: df.groupby('column_name').max()

How do I find the maximum and lowest values of each group?

Now, we can use the group_by and the top_n functions to find the highest and lowest numeric values of each group: The RStudio console is showing the result of the previous R syntax: The maxima of A, B, and C are 2, 5, and 10, respectively. UPDATE: Note that top_n has been superseded in favor of slice_min ()/slice_max ().

How do you find the Max of a group in R?

Now, we can use the group_by and the top_n functions to find the highest and lowest numeric values of each group: The RStudio console is showing the result of the previous R syntax: The maxima of A, B, and C are 2, 5, and 10, respectively.


3 Answers

Try this

df.loc[df.groupby('name')['count'].idxmax()]['type']

      name type   count
3  charlie    x  442123
2   robert    z    5123

In case you want not just a single max value but the top n values per group you can do (e.g. n = 2)

df.loc[df.groupby('name')['count'].nlargest(2).index.get_level_values(1)]

      name type   count
3  charlie    x  442123
5  charlie    z      42
2   robert    z    5123
1   robert    y     456
like image 185
ayorgo Avatar answered Oct 21 '22 09:10

ayorgo


Just sort on name and count, group by name and keep first.

df.sort_values(['name', 'count'],ascending=False).groupby(['name']).first().reset_index()

will give you:

    name type   count
3  charlie    x  442123
2   robert    z    5123
like image 5
Steven Zindel Avatar answered Oct 21 '22 10:10

Steven Zindel


What if you have two maxes for a name with different types:

print(df)

      name type   count
0   robert    x     123
1   robert    y     456
2   robert    z    5123
3   robert    a    5123
4  charlie    x  442123
5  charlie    y       0
6  charlie    z      42

Use boolean indexing:

df[df['count'] == df.groupby('name')['count'].transform('max')]

Output:

      name type   count
2   robert    z    5123
3   robert    a    5123
4  charlie    x  442123
like image 4
Scott Boston Avatar answered Oct 21 '22 10:10

Scott Boston