Pandas Aggregate/Group by based on most recent date

I have a DataFrame as follows, where Id is a string and Date is a datetime:

Id    Date
1     3-1-2012
1     4-8-2013
2     1-17-2013
2     5-4-2013
2     10-30-2012
3     1-3-2013

I'd like to consolidate the table to just show one row for each Id which has the most recent date.
Any thoughts on how to do this?

What does Group_by do in pandas?

groupby() function is used to split the data into groups based on some criteria. pandas objects can be split on any of their axes. The abstract definition of grouping is to provide a mapping of labels to group names. sort : Sort group keys.

What does AGG function do in pandas?

Pandas DataFrame agg() Method The agg() method allows you to apply a function or a list of function names to be executed along one of the axis of the DataFrame, default 0, which is the index (row) axis. Note: the agg() method is an alias of the aggregate() method.

What is Groupby AGG?

Function to use for aggregating the data. If a function, must either work when passed a DataFrame or when passed to DataFrame. apply. For a DataFrame, can pass a dict, if the keys are DataFrame column names.

You can groupby the Id field:

In [11]: df
Out[11]:
  Id                Date
0  1 2012-03-01 00:00:00
1  1 2013-04-08 00:00:00
2  2 2013-01-17 00:00:00
3  2 2013-05-04 00:00:00
4  2 2012-10-30 00:00:00
5  3 2013-01-03 00:00:00

In [12]: g = df.groupby('Id')

If you are not certain about the ordering, you could do something along the lines:

In [13]: g.agg(lambda x: x.iloc[x.Date.argmax()])
Out[13]:
                  Date
Id
1  2013-04-08 00:00:00
2  2013-05-04 00:00:00
3  2013-01-03 00:00:00

which for each group grabs the row with largest (latest) date (the argmax part).

If you knew they were in order you could take the last (or first) entry:

In [14]: g.last()
Out[14]:
                  Date
Id
1  2013-04-08 00:00:00
2  2012-10-30 00:00:00
3  2013-01-03 00:00:00

(Note: they're not in order, so this doesn't work in this case!)

In the Hayden response, I think that using x.loc in place of x.iloc is better, as the index of the df dataframe could be sparse (and in this case the iloc will not work).

(I haven't enought points on stackoverflow to post it in comments of the response).

Pandas Aggregate/Group by based on most recent date

Tags:

pandas

python-2.7

ChrisArmstrong

People also ask

2 Answers

Andy Hayden

head7

Recent Activity

Donate For Us

Pandas Aggregate/Group by based on most recent date

Tags:

pandas

python-2.7

ChrisArmstrong

People also ask

2 Answers

Andy Hayden

head7

Related questions

Recent Activity

Donate For Us