Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Add days to dates in dataframe

I am stymied at the moment. I am sure that I am missing something simple, but how do you move a series of dates forward by x units? In my more specific case I want to add 180 days to a date series within a dataframe.

Here is what I have so far:

import pandas, numpy, StringIO, datetime   txt = '''ID,DATE 002691c9cec109e64558848f1358ac16,2003-08-13 00:00:00 002691c9cec109e64558848f1358ac16,2003-08-13 00:00:00 0088f218a1f00e0fe1b94919dc68ec33,2006-05-07 00:00:00 0088f218a1f00e0fe1b94919dc68ec33,2006-06-03 00:00:00 00d34668025906d55ae2e529615f530a,2006-03-09 00:00:00 00d34668025906d55ae2e529615f530a,2006-03-09 00:00:00 0101d3286dfbd58642a7527ecbddb92e,2007-10-13 00:00:00 0101d3286dfbd58642a7527ecbddb92e,2007-10-27 00:00:00 0103bd73af66e5a44f7867c0bb2203cc,2001-02-01 00:00:00 0103bd73af66e5a44f7867c0bb2203cc,2008-01-20 00:00:00 ''' df = pandas.read_csv(StringIO.StringIO(txt)) df = df.sort('DATE') df.DATE = pandas.to_datetime(df.DATE) df['X_DATE'] = df['DATE'].shift(180, freq=pandas.datetools.Day) 

This code generates a type error. For reference I am using:

Python 2.7.4 Pandas '0.12.0.dev-6e7c4d6' Numpy '1.7.1'

like image 825
BigHandsome Avatar asked May 05 '13 14:05

BigHandsome


People also ask

How do I add days to a date in Python?

Add Days to datetime Object If we want to add days to a datetime object, we can use the timedelta() function of the datetime module. The previous Python syntax has created a new data object containing the datetime 2024-11-24 14:36:56, i.e. 18 days later than our input date.

How do I add days to a datetime?

The DateTime. AddDays() method in C# is used to add the specified number of days to the value of this instance. This method returns a new DateTime.

How do you subtract days in pandas?

When the function receives the date string it will first use the Pandas to_datetime() function to convert it to a Python datetime and it will then use the timedelta() function to subtract the number of days defined in the days variable.


2 Answers

If I understand you, you don't actually want shift, you simply want to make a new column next to the existing DATE which is 180 days after. In that case, you can use timedelta:

>>> from datetime import timedelta >>> df.head()                                  ID                DATE 8  0103bd73af66e5a44f7867c0bb2203cc 2001-02-01 00:00:00 0  002691c9cec109e64558848f1358ac16 2003-08-13 00:00:00 1  002691c9cec109e64558848f1358ac16 2003-08-13 00:00:00 5  00d34668025906d55ae2e529615f530a 2006-03-09 00:00:00 4  00d34668025906d55ae2e529615f530a 2006-03-09 00:00:00 >>> df["X_DATE"] = df["DATE"] + timedelta(days=180) >>> df.head()                                  ID                DATE              X_DATE 8  0103bd73af66e5a44f7867c0bb2203cc 2001-02-01 00:00:00 2001-07-31 00:00:00 0  002691c9cec109e64558848f1358ac16 2003-08-13 00:00:00 2004-02-09 00:00:00 1  002691c9cec109e64558848f1358ac16 2003-08-13 00:00:00 2004-02-09 00:00:00 5  00d34668025906d55ae2e529615f530a 2006-03-09 00:00:00 2006-09-05 00:00:00 4  00d34668025906d55ae2e529615f530a 2006-03-09 00:00:00 2006-09-05 00:00:00 

Does that help any?

like image 80
DSM Avatar answered Sep 30 '22 04:09

DSM


You could use pd.DateOffset. Which seems to be faster than timedelta.

In [930]: df['x_DATE'] = df['DATE'] + pd.DateOffset(days=180)  In [931]: df Out[931]:                                  ID       DATE     x_DATE 8  0103bd73af66e5a44f7867c0bb2203cc 2001-02-01 2001-07-31 0  002691c9cec109e64558848f1358ac16 2003-08-13 2004-02-09 1  002691c9cec109e64558848f1358ac16 2003-08-13 2004-02-09 4  00d34668025906d55ae2e529615f530a 2006-03-09 2006-09-05 5  00d34668025906d55ae2e529615f530a 2006-03-09 2006-09-05 2  0088f218a1f00e0fe1b94919dc68ec33 2006-05-07 2006-11-03 3  0088f218a1f00e0fe1b94919dc68ec33 2006-06-03 2006-11-30 6  0101d3286dfbd58642a7527ecbddb92e 2007-10-13 2008-04-10 7  0101d3286dfbd58642a7527ecbddb92e 2007-10-27 2008-04-24 9  0103bd73af66e5a44f7867c0bb2203cc 2008-01-20 2008-07-18 

Timings

Medium

In [948]: df.shape Out[948]: (10000, 3)  In [950]: %timeit df['DATE'] + pd.DateOffset(days=180) 1000 loops, best of 3: 1.51 ms per loop  In [949]: %timeit df['DATE'] + timedelta(days=180) 100 loops, best of 3: 2.71 ms per loop 

Large

In [952]: df.shape Out[952]: (100000, 3)  In [953]: %timeit df['DATE'] + pd.DateOffset(days=180) 100 loops, best of 3: 4.16 ms per loop  In [955]: %timeit df['DATE'] + timedelta(days=180) 10 loops, best of 3: 20 ms per loop 
like image 45
Zero Avatar answered Sep 30 '22 05:09

Zero