I have the following table:
Item number | crit_A | crit_B|
------------|--------|-------|
1 | 100 | 20 |
1 | 10 | 100 |
1 | 50 | 50 |
2 | 10 | 100 |
2 | 90 | 10 |
2 | 90 | 10 |
I would like a pandas dataframe operation to only return the first and 5th row. This corresponds to the rows where crit_A is max for a given item.
Item number | crit_A | crit_B|
------------|--------|-------|
1 | 100 | 20 |
2 | 90 | 10 |
Note: When crit_A has multiple equal values for a given item, I just need one item returned.
The following is not what I am looking for:
res_82_df.groupby(['Item number']).max()
This does not work because it would group by Item number but return the maximum value for all columns. Also note: I could look for an arbitrary threshold and perform a query statement. But this approach is also not robust because I would always need to look at the data and make a value judgement.
How do I accomplish this efficiently?
Note: My question indeed is a duplicate of the question linked above. The answer here though is very unique and much more concise, and does what I ask it to.
I'd do it this way:
In [107]: df.loc[df.groupby('Item number')['crit_A'].idxmax()]
Out[107]:
Item number crit_A crit_B
0 1 100 20
4 2 90 10
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With