Filter pandas Dataframe based on max values in a column

Tags:

I have a DataFrame with repeating values in the index. I would like to filter this dataset down to only show me one instance of each index by selecting the row within the index with the greatest value in a different column. For example, my DataFrame looks like this:

df:

Product ID     Store     Sales
    1            A         50
    1            B        200
    1            C         20
    2            A        400
    2            B         10
    3            A        200
    4            A         50
    4            B        100
    4            C        500

I would like to filter this data down to this:

df2:

Product ID     Store     Sales
    1            B        200
    2            A        400
    3            A        200
    4            C        500

Any thoughts on how best to approach this issue in pandas?

Thanks very much for your time -

405

asked Aug 01 '14 02:08

wrcobb

1 Answers

You can perform a groupby on 'Product ID', then apply idxmax on 'Sales' column. This will create a series with the index of the highest values. We can then use the index values to index into the original dataframe using iloc

In [201]:

df.iloc[df.groupby('Product ID')['Sales'].agg(pd.Series.idxmax)]
Out[201]:
   Product_ID Store  Sales
1           1     B    200
3           2     A    400
5           3     A    200
8           4     C    500

130

answered Oct 05 '22 23:10

EdChum

Related questions
                            
                                Sphinx and relative imports in Python 3.*
                            
                                Update label of tkinter menubar item?
                            
                                Does Slicing `a` (e.g. `a[1:] == a[:-1]`) create copies of the `a`?
                            
                                Set.pop() isn't random?
                            
                                Wsgiref Error: AttributeError: 'NoneType' object has no attribute 'split'
                            
                                Can't pickle static method - Multiprocessing - Python
                            
                                How to write a pandas Series to CSV as a row, not as a column?
                            
                                import a.b also imports a?
                            
                                Django AttributeError Model object has no attribute 'filter'
                            
                                Flask 404 when using SERVER_NAME
                            
                                Explicitly typed version of Python?
                            
                                python argparse print usage text after description
                            
                                Approximating data with a multi segment cubic bezier curve and a distance as well as a curvature contraint
                            
                                Display multiple mpld3 exports on a single HTML page
                            
                                How to JSON serialize __dict__ of a Django model?
                            
                                BeautifulSoup: Find table by style
                            
                                Django: filter by latest per distinct column
                            
                                Comparing two xml files in python
                            
                                PyMongo/Mongoengine equivalent of mongodump
                            
                                Write a hex string as binary data in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Filter pandas Dataframe based on max values in a column

Tags:

python

pandas

numpy

wrcobb

People also ask

1 Answers

EdChum

Recent Activity

Donate For Us