Pandas get first and last value of column from group

Tags:

Hi I have dataframe that contains multiple rows for the same ID. One of the columns is a Date (in ascending order). I want to calculate the date difference between the first entry and last.

I am doing this by instantiating a pandas constructor as follows:

g = df.groupby('ID')

print(pd.DataFrame({'first':g.Date.nth(0), 'last':g.Date.nth(-1)}))

The first value is correct, however, the last value is nowhere near correct.

For example, for a specific id, the first date is 2000-05-08 and the last date is 8/21/2010. The result outputted is:

               first       last
ID                         
31965.0        2000-05-08  2002-12-29

2002-12-29 is somewhere in the middle.

Sample Data:

ID  Date
31965   5/8/2000
31965   5/10/2000
31965   5/18/2000
31965   5/22/2000
31965   5/23/2000
31965   5/25/2000
31965   5/30/2000
31965   6/7/2000
31965   6/8/2000
31965   6/11/2000
31965   6/13/2000
.....
31965   4/11/2009
31965   5/9/2009
31965   5/16/2009
31965   5/23/2009
31965   2/5/2010
31965   2/26/2010
31965   3/13/2010
31965   4/10/2010
31965   8/21/2010

I want my result for ID 31965 to be: 5/8/2000 and 8/21/2010 so that I can eventually work out the date difference.

646

asked Mar 05 '18 00:03

David

2 Answers

You can do this in one step, be sure your 'Date' column is dtype datetime,

df['Date'] = pd.to_datetime(df['Date'])

df.groupby('ID')['Date'].agg(['first','last'])

Now, I suspect maybe your data isn't order correctly, but if you still wanted to earliest and the latest date then you can do this:

df.groupby('ID')['Date'].agg(['min','max']).rename(columns={'min':'first','max':'last'})

Or you can use sort_values then:

df.sort_values('Date').groupby('ID')['Date'].agg(['first','last'])

181

answered Oct 03 '22 17:10

Scott Boston

You probably might have to parse the last date in this way:

import datetime

def parser(x):
    return datetime.strptime(str(x), '%m/%d/%Y')

Here, you feed your date string into the function, and the function returns a parsed date. You can parse the first date similarly, and produce something consistent with the last date; the only thing you might need to change in the region %m/%d/%Y. That should solve your problem. Read this page for more information: https://docs.python.org/2/library/datetime.html

answered Oct 04 '22 17:10

troymyname00

Related questions
                            
                                Pandas split CSV into multiple CSV's (or DataFrames) by a column
                            
                                OpenCV: apply Rotation matrix from Rodrigues() to a point
                            
                                What is the run time of the set difference function in Python?
                            
                                How to implement the derivative of Leaky Relu in python?
                            
                                Group rows by overlapping ranges
                            
                                Three different types of output when reading an image with three different libraries in Python
                            
                                Subtract each row of matrix A from every row of matrix B without loops
                            
                                basemap ImportError: No module named 'mpl_toolkits.basemap'
                            
                                Flask validate_on_submit always False
                            
                                What are the "parts" in a multipart email?
                            
                                Detect when multiprocessing queue is empty and closed
                            
                                Python: Raise square matrix to negative half power
                            
                                Can I get the shape of a numpy save file without reading the entire contents (e.g. memmap)
                            
                                Move mouse cursor to second monitor using pyautogui
                            
                                Hide command prompt in Selenium ChromeDriver
                            
                                Mock method which returns same value passed as argument
                            
                                Difference between slash operator and comma separator in pathlib Path
                            
                                How to pivot one column containing strings in a dataframe? [duplicate]
                            
                                How to assign values to multiple non existing columns in a pandas dataframe?
                            
                                Generating SIMD instructions from Cython code

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas get first and last value of column from group

Tags:

python

date

pandas

dataframe

David

People also ask

2 Answers

Scott Boston

troymyname00

Recent Activity

Donate For Us