I have a DataFrame like
customer_id product_code No_visits
1 90 15
1 105 30
1 100 45
2 50 60
2 100 45
2 10 30
I want to choose the customer_id, product_code corresponding with having the highest no_visits
customer_id product_code No_visits
1 100 45
2 50 60
df.groupby('customer_id')['product_code, no_visits'].max()
which gives me the results of maximum product_code and maximum no_visits, which are absolutely not what I want. Please help
This is what idxmax
is for. Grab the index of the maximum value then use that index in a call to loc
.
This will produce identical results to MaxU but potentially different results as AmiTavory. Ami's results may produce more than one result per customer_id
if there is more than one row that has a No_visits
value equal to the maximum value. Mine and MaxU's answers will select whichever row came first and is equal to the max.
df.loc[df.groupby('customer_id').No_visits.idxmax()]
customer_id product_code No_visits
2 1 100 45
3 2 50 60
You can use
>>> df[df.No_visits == df.No_visits.groupby(df.customer_id).transform(max)]
customer_id product_code No_visits
2 1 100 45
3 2 50 60
Explanation
df.No_visits == df.No_visits.groupby(df.customer_id).transform(max)
finds, for each row, the maximal number of visits for the customer id of that row. Then you simply filter on the rows for those where the number of visits is equal to that number.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With