Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

group by pandas dataframe and select latest in each group

How to group values of pandas dataframe and select the latest(by date) from each group?

For example, given a dataframe sorted by date:

    id     product   date 0   220    6647     2014-09-01  1   220    6647     2014-09-03  2   220    6647     2014-10-16 3   826    3380     2014-11-11 4   826    3380     2014-12-09 5   826    3380     2015-05-19 6   901    4555     2014-09-01 7   901    4555     2014-10-05 8   901    4555     2014-11-01 

grouping by id or product, and selecting the earliest gives:

    id     product   date 2   220    6647     2014-10-16 5   826    3380     2015-05-19 8   901    4555     2014-11-01 
like image 450
DevEx Avatar asked Jan 07 '17 20:01

DevEx


People also ask

How do I select the last 5 columns in pandas?

Use iloc[] to select last N columns of pandas dataframe. Use [] to select last N columns of pandas dataframe. Use tail() to select last N columns of pandas dataframe.

How do you get the last 5 rows in pandas?

Method 1: Using tail() method DataFrame. tail(n) to get the last n rows of the DataFrame. It takes one optional argument n (number of rows you want to get from the end). By default n = 5, it return the last 5 rows if the value of n is not passed to the method.

Does pandas Groupby keep order?

Groupby preserves the order of rows within each group. When calling apply, add group keys to index to identify pieces. Reduce the dimensionality of the return type if possible, otherwise return a consistent type.


1 Answers

You can also use tail with groupby to get the last n values of the group:

df.sort_values('date').groupby('id').tail(1)      id  product date 2   220 6647    2014-10-16 8   901 4555    2014-11-01 5   826 3380    2015-05-19 
like image 151
nipy Avatar answered Sep 21 '22 19:09

nipy