I have a dataframe that contains some information about users. There is a column for user, column for type, and column for count, like this:
name type count
robert x 123
robert y 456
robert z 5123
charlie x 442123
charlie y 0
charlie z 42
I'm trying to figure out which type has the highest count per name, so for this case, I would want to select this:
name type count
robert z 5123
charlie x 442123
I know I can do something like this to get the max count per name, but I'm not sure how I can include the "type" column, which is actually the most important
df.sort_values('count', ascending=False).drop_duplicates('name').sort_index()
Any help is greatly appreciated!
You can group DataFrame rows into a list by using pandas. DataFrame. groupby() function on the column of interest, select the column you want as a list from group and then use Series. apply(list) to get the list for every group.
To get the maximum value of each group, you can directly apply the pandas max() function to the selected column(s) from the result of pandas groupby.
How to get record with max value using SQL subquery. Here's the SQL query to get rows with max sale value using SQL subquery. In the above query, we first select the max value for table in subquery (in bold). Then we select those rows from original sales table where sale column value is max value.
max. Compute max of group values. Include only float, int, boolean columns.
Easy solution would be to apply : idxmax () function to get indices of rows with max values. This would filter out all the rows with max value in the group.
Often you may be interested in finding the max value by group in a pandas DataFrame. Fortunately this is easy to do using the groupby () and max () functions with the following syntax: df.groupby('column_name').max()
Now, we can use the group_by and the top_n functions to find the highest and lowest numeric values of each group: The RStudio console is showing the result of the previous R syntax: The maxima of A, B, and C are 2, 5, and 10, respectively. UPDATE: Note that top_n has been superseded in favor of slice_min ()/slice_max ().
Now, we can use the group_by and the top_n functions to find the highest and lowest numeric values of each group: The RStudio console is showing the result of the previous R syntax: The maxima of A, B, and C are 2, 5, and 10, respectively.
Try this
df.loc[df.groupby('name')['count'].idxmax()]['type']
name type count
3 charlie x 442123
2 robert z 5123
In case you want not just a single max value but the top n
values per group you can do (e.g. n = 2
)
df.loc[df.groupby('name')['count'].nlargest(2).index.get_level_values(1)]
name type count
3 charlie x 442123
5 charlie z 42
2 robert z 5123
1 robert y 456
Just sort on name and count, group by name and keep first.
df.sort_values(['name', 'count'],ascending=False).groupby(['name']).first().reset_index()
will give you:
name type count
3 charlie x 442123
2 robert z 5123
What if you have two maxes for a name with different types:
print(df)
name type count
0 robert x 123
1 robert y 456
2 robert z 5123
3 robert a 5123
4 charlie x 442123
5 charlie y 0
6 charlie z 42
Use boolean indexing:
df[df['count'] == df.groupby('name')['count'].transform('max')]
Output:
name type count
2 robert z 5123
3 robert a 5123
4 charlie x 442123
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With