Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Grab rows with max date from pandas dataframe

Tags:

python

pandas

I have a Pandas dataframe that looks like this:

enter image description here

and I want to grab for each distinct ID, the row with the max date so that my final results looks something like this:

enter image description here

My date column is of data type 'object'. I have tried grouping and then trying to grab the max like the following:

idx = df.groupby(['ID','Item'])['date'].transform(max) == df_Trans['date']
df_new = df[idx]

However I am unable to get the desired result.

like image 701
user3116949 Avatar asked Nov 07 '18 22:11

user3116949


Video Answer


1 Answers

idxmax

Should work so long as index is unique or the maximal index isn't repeated.

df.loc[df.groupby('ID').date.idxmax()]

OP (edited)

Should work as long as maximal values are unique. Otherwise, you'll get all rows equal to the maximum.

df[df.groupby('ID')['date'].transform('max') == df['date']]

W-B go to solution

And also very good solution.

df.sort_values(['ID', 'date']).drop_duplicates('date', keep='last')
like image 108
piRSquared Avatar answered Oct 02 '22 14:10

piRSquared