Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get date after subtracting days in pandas

Tags:

python

pandas

I have a dataframe:

In [15]: df
Out[15]: 
        date  day
0 2015-10-10   23
1 2015-12-19    9
2 2016-03-05   34
3 2016-09-17   23
4 2016-04-30    2

I want to subtract the number of days from the date and create a new column.

In [16]: df.dtypes
Out[16]: 
date    datetime64[ns]
day              int64

Desired output something like:

In [15]: df
Out[15]: 
        date  day date1
0 2015-10-10   23 2015-09-17
1 2015-12-19    9 2015-12-10
2 2016-03-05   34 2016-01-29
3 2016-09-17   23 2016-08-25
4 2016-04-30    2 2016-04-28

I tried but this does not work:

df['date1']=df['date']+pd.Timedelta(df['date'].dt.day-df['day'])

it throws error :

TypeError: unsupported type for timedelta days component: Series

like image 335
rey Avatar asked Oct 18 '16 09:10

rey


People also ask

How do you get days by subtracting dates in pandas?

first, calculate the difference between the two dates. second, convert the difference in the metric you want to use… 'D' for day, 'W' for weeks, 'M' for month, 'Y' for year.

How do I subtract days from a date in python?

For adding or subtracting Date, we use something called timedelta() function which can be found under the DateTime class. It is used to manipulate Date, and we can perform arithmetic operations on dates like adding or subtracting.


2 Answers

You can use to_timedelta:

df['date1'] = df['date'] -  pd.to_timedelta(df['day'], unit='d')

print (df)
        date  day      date1
0 2015-10-10   23 2015-09-17
1 2015-12-19    9 2015-12-10
2 2016-03-05   34 2016-01-31
3 2016-09-17   23 2016-08-25
4 2016-04-30    2 2016-04-28

If need Timedelta use apply, but it is slower:

df['date1'] = df['date'] -  df.day.apply(lambda x: pd.Timedelta(x, unit='D'))

print (df)
        date  day      date1
0 2015-10-10   23 2015-09-17
1 2015-12-19    9 2015-12-10
2 2016-03-05   34 2016-01-31
3 2016-09-17   23 2016-08-25
4 2016-04-30    2 2016-04-28

Timings:

#[5000 rows x 2 columns]
df = pd.concat([df]*1000).reset_index(drop=True)

In [252]: %timeit df['date'] -  df.day.apply(lambda x: pd.Timedelta(x, unit='D'))
10 loops, best of 3: 45.3 ms per loop

In [253]: %timeit df['date'] -  pd.to_timedelta(df['day'], unit='d')
1000 loops, best of 3: 1.71 ms per loop
like image 144
jezrael Avatar answered Oct 13 '22 23:10

jezrael


import dateutil.relativedelta
def calculate diff(v):
    return v['date'] - dateutil.relativedelta.relativedelta(day=v['day'])
df['date1']=df.apply(calculate_diff, axis=1)

given that v['date'] is datetime object

like image 30
sachin saxena Avatar answered Oct 13 '22 23:10

sachin saxena