Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas get first and last value of column from group

Hi I have dataframe that contains multiple rows for the same ID. One of the columns is a Date (in ascending order). I want to calculate the date difference between the first entry and last.

I am doing this by instantiating a pandas constructor as follows:

g = df.groupby('ID')

print(pd.DataFrame({'first':g.Date.nth(0), 'last':g.Date.nth(-1)}))

The first value is correct, however, the last value is nowhere near correct.

For example, for a specific id, the first date is 2000-05-08 and the last date is 8/21/2010. The result outputted is:

               first       last
ID                         
31965.0        2000-05-08  2002-12-29

2002-12-29 is somewhere in the middle.

Sample Data:

ID  Date
31965   5/8/2000
31965   5/10/2000
31965   5/18/2000
31965   5/22/2000
31965   5/23/2000
31965   5/25/2000
31965   5/30/2000
31965   6/7/2000
31965   6/8/2000
31965   6/11/2000
31965   6/13/2000
.....
31965   4/11/2009
31965   5/9/2009
31965   5/16/2009
31965   5/23/2009
31965   2/5/2010
31965   2/26/2010
31965   3/13/2010
31965   4/10/2010
31965   8/21/2010

I want my result for ID 31965 to be: 5/8/2000 and 8/21/2010 so that I can eventually work out the date difference.

like image 646
David Avatar asked Mar 05 '18 00:03

David


People also ask

How do I get the last value in a python group?

To get the last row of each group, call last() after grouping.

What is ILOC () in python?

The iloc() function in python is defined in the Pandas module that helps us to select a specific row or column from the data set. Using the iloc method in python, we can easily retrieve any particular value from a row or column by using index values.

How do you get Groupby index in pandas?

How to perform groupby index in pandas? Pass index name of the DataFrame as a parameter to groupby() function to group rows on an index. DataFrame. groupby() function takes string or list as a param to specify the group columns or index.

Does pandas Groupby keep order?

Groupby preserves the order of rows within each group.


2 Answers

You can do this in one step, be sure your 'Date' column is dtype datetime,

df['Date'] = pd.to_datetime(df['Date'])

df.groupby('ID')['Date'].agg(['first','last'])

Now, I suspect maybe your data isn't order correctly, but if you still wanted to earliest and the latest date then you can do this:

df.groupby('ID')['Date'].agg(['min','max']).rename(columns={'min':'first','max':'last'})

Or you can use sort_values then:

df.sort_values('Date').groupby('ID')['Date'].agg(['first','last'])
like image 181
Scott Boston Avatar answered Oct 03 '22 17:10

Scott Boston


You probably might have to parse the last date in this way:

import datetime

def parser(x):
    return datetime.strptime(str(x), '%m/%d/%Y')

Here, you feed your date string into the function, and the function returns a parsed date. You can parse the first date similarly, and produce something consistent with the last date; the only thing you might need to change in the region %m/%d/%Y. That should solve your problem. Read this page for more information: https://docs.python.org/2/library/datetime.html

like image 43
troymyname00 Avatar answered Oct 04 '22 17:10

troymyname00