Let's say I have a data table with 3 columns:
Category Color Date
triangle red 2017-10-10
square yellow 2017-11-10
triangle blue 2017-02-10
circle yellow 2017-07-10
circle red 2017-09-10
I want to find out the earliest date by each category. So my desired output is:
Category Color Date
square yellow 2017-11-10
triangle blue 2017-02-10
circle yellow 2017-07-10
I've looked through a couple posts about how to do this:
Finding the min date in a Pandas DF row and create new Column
Pandas groupby category, rating, get top value from each category?
With Pandas in Python, select the highest value row for each group
and more.
A popular method is the groupby method:
df.groupby('Category').first().reset_index()
But if I use this method, then it'll group by Category, but it'll keep both records for triangle since it has two different colors.
Is there a better and more efficient way to do this?
You could use sort_values + drop_duplicates:
df.sort_values(['Date']).drop_duplicates('Category', keep='first')
Category Color Date
2 triangle blue 2017-02-10
3 circle yellow 2017-07-10
1 square yellow 2017-11-10
If you want to preserve the original order of Category, you'll need to sort on a groupby call:
df.groupby('Category', group_keys=False, sort=False)\
.apply(lambda x: x.sort_values('Date'))\
.drop_duplicates('Category', keep='first')
Category Color Date
2 triangle blue 2017-02-10
1 square yellow 2017-11-10
3 circle yellow 2017-07-10
The following should give you the desired output; compare to what you posted I first sort the values according to the date as you want to keep the earliest date per category:
df.sort_values('Date').groupby('Category').first().reset_index()
That gives the desired output:
Category Color Date
0 circle yellow 2017-07-10
1 square yellow 2017-11-10
2 triangle blue 2017-02-10
EDIT
Thanks to @Wen in the comments, one can make this call also more efficient by doing:
df.sort_values('Date').groupby('Category', as_index=False).first()
which also gives
Category Color Date
0 circle yellow 2017-07-10
1 square yellow 2017-11-10
2 triangle blue 2017-02-10
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With