I have a dataframe that contains some information about users. There is a column for user, column for type, and column for count, like this: <pre class="prettyprint"><code>name type count robert x 123 robert y 456 robert z 5123 charlie x 442123 charlie y 0 charlie z 42 </code></pre> I'm trying to figure out which type has the highest count per name, so for this case, I would want to select this: <pre class="prettyprint"><code>name type count robert z 5123 charlie x 442123 </code></pre> I know I can do something like this to get the max count per name, but I'm not sure how I can include the "type" column, which is actually the most important <pre class="prettyprint"><code>df.sort_values('count', ascending=False).drop_duplicates('name').sort_index() </code></pre> Any help is greatly appreciated!

Try this <pre class="prettyprint"><code>df.loc[df.groupby('name')['count'].idxmax()]['type'] </code></pre> <pre class="prettyprint"><code> name type count 3 charlie x 442123 2 robert z 5123 </code></pre> In case you want not just a single max value but the top <code>n</code> values per group you can do (e.g. <code>n = 2</code>) <pre class="prettyprint"><code>df.loc[df.groupby('name')['count'].nlargest(2).index.get_level_values(1)] </code></pre> <pre class="prettyprint"><code> name type count 3 charlie x 442123 5 charlie z 42 2 robert z 5123 1 robert y 456 </code></pre>

What if you have two maxes for a name with different types: <pre class="prettyprint"><code>print(df) name type count 0 robert x 123 1 robert y 456 2 robert z 5123 3 robert a 5123 4 charlie x 442123 5 charlie y 0 6 charlie z 42 </code></pre> Use boolean indexing: <pre class="prettyprint"><code>df[df['count'] == df.groupby('name')['count'].transform('max')] </code></pre> Output: <pre class="prettyprint"><code> name type count 2 robert z 5123 3 robert a 5123 4 charlie x 442123 </code></pre>

Select rows with highest value from groupby

Tags:

python

pandas

I have a dataframe that contains some information about users. There is a column for user, column for type, and column for count, like this:

name         type     count
robert       x        123
robert       y        456
robert       z        5123
charlie      x        442123
charlie      y        0 
charlie      z        42

I'm trying to figure out which type has the highest count per name, so for this case, I would want to select this:

name         type    count
robert       z       5123
charlie      x       442123

I know I can do something like this to get the max count per name, but I'm not sure how I can include the "type" column, which is actually the most important

df.sort_values('count', ascending=False).drop_duplicates('name').sort_index()

Any help is greatly appreciated!

687

asked Dec 18 '18 22:12

Ryan Black

3 Answers

Try this

df.loc[df.groupby('name')['count'].idxmax()]['type']

      name type   count
3  charlie    x  442123
2   robert    z    5123

In case you want not just a single max value but the top n values per group you can do (e.g. n = 2)

df.loc[df.groupby('name')['count'].nlargest(2).index.get_level_values(1)]

      name type   count
3  charlie    x  442123
5  charlie    z      42
2   robert    z    5123
1   robert    y     456

185

answered Oct 21 '22 09:10

ayorgo

Just sort on name and count, group by name and keep first.

df.sort_values(['name', 'count'],ascending=False).groupby(['name']).first().reset_index()

will give you:

    name type   count
3  charlie    x  442123
2   robert    z    5123

answered Oct 21 '22 10:10

Steven Zindel

What if you have two maxes for a name with different types:

print(df)

      name type   count
0   robert    x     123
1   robert    y     456
2   robert    z    5123
3   robert    a    5123
4  charlie    x  442123
5  charlie    y       0
6  charlie    z      42

Use boolean indexing:

df[df['count'] == df.groupby('name')['count'].transform('max')]

Output:

      name type   count
2   robert    z    5123
3   robert    a    5123
4  charlie    x  442123

answered Oct 21 '22 10:10

Scott Boston

Related questions
                            
                                How to use flask context with concurrent.futures.ThreadPoolExecutor
                            
                                Drop duplicates, but ignore nulls
                            
                                adding static() to urlpatterns only work by appending to the list
                            
                                Unable to print names in the right way in another function
                            
                                Dividing each row by the previous one
                            
                                Merge two columns into one within the same data frame in pandas/python
                            
                                How to increase process speed using read_excel in pandas?
                            
                                Change color of individual boxes in pandas boxplot subplots
                            
                                Run bash script with Django
                            
                                PipEnv: How to handle locally installed .whl packages
                            
                                Python - matplotlib - how do I plot a plane from equation?
                            
                                Merging multiple CSV files into separate tabs of a spreadsheet in Python
                            
                                iterating markers in plots
                            
                                Installing MS C++ 14.0 for python without Visual Studio
                            
                                Get Chrome tab URL in Python
                            
                                How to convert RGB image pixels to L*a*b*?
                            
                                ImportError: cannot import name 'password_reset'
                            
                                Update dataframe based on index and append the new ones
                            
                                Using pandas and json_normalize to flatten nested JSON API response
                            
                                Reading Data From Cloud Storage Via Cloud Functions

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With