Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filter pandas Dataframe based on max values in a column

I have a DataFrame with repeating values in the index. I would like to filter this dataset down to only show me one instance of each index by selecting the row within the index with the greatest value in a different column. For example, my DataFrame looks like this:

df:

Product ID     Store     Sales
    1            A         50
    1            B        200
    1            C         20
    2            A        400
    2            B         10
    3            A        200
    4            A         50
    4            B        100
    4            C        500

I would like to filter this data down to this:

df2:

Product ID     Store     Sales
    1            B        200
    2            A        400
    3            A        200
    4            C        500

Any thoughts on how best to approach this issue in pandas?

Thanks very much for your time -

like image 405
wrcobb Avatar asked Aug 01 '14 02:08

wrcobb


People also ask

How do I filter data based on conditions in pandas?

You can use df[df["Courses"] == 'Spark'] to filter rows by a condition in pandas DataFrame. Not that this expression returns a new DataFrame with selected rows.

How do you find the maximum value of a column in a DataFrame?

The max() method returns a Series with the maximum value of each column. By specifying the column axis ( axis='columns' ), the max() method searches column-wise and returns the maximum value for each row.


1 Answers

You can perform a groupby on 'Product ID', then apply idxmax on 'Sales' column. This will create a series with the index of the highest values. We can then use the index values to index into the original dataframe using iloc

In [201]:

df.iloc[df.groupby('Product ID')['Sales'].agg(pd.Series.idxmax)]
Out[201]:
   Product_ID Store  Sales
1           1     B    200
3           2     A    400
5           3     A    200
8           4     C    500
like image 130
EdChum Avatar answered Oct 05 '22 23:10

EdChum