Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas: subtracting current date from the date in a pandas table

I am attempting to calculate the difference in days between todays and a pandas data consisting of historical data. Below is the intended code:

df['diff'] = pd.to_datetime( df['date']) - pd.datetime.now().date()

However, it produces the following error:

TypeError: unsupported operand type(s) for -: 'DatetimeIndex' and 'datetime.date'

The date column in the pandas table looks like this:

0       2018-12-18
1       2018-12-18
2       2018-12-18
3       2018-12-18
4       2018-12-18

How do I fix this error. Thanks in advance.

like image 822
MathiasRa Avatar asked Dec 20 '18 11:12

MathiasRa


People also ask

How do you subtract dates in pandas?

When the function receives the date string it will first use the Pandas to_datetime() function to convert it to a Python datetime and it will then use the timedelta() function to subtract the number of days defined in the days variable.

How do you subtract today's date in python?

You can subtract a day from a python date using the timedelta object. You need to create a timedelta object with the amount of time you want to subtract. Then subtract it from the date.

How do you get days by subtracting dates in pandas?

first, calculate the difference between the two dates. second, convert the difference in the metric you want to use… 'D' for day, 'W' for weeks, 'M' for month, 'Y' for year.

How do you subtract a date from a string in python?

Use the strptime(date_str, format) function to convert a date string into a datetime object as per the corresponding format . To get the difference between two dates, subtract date2 from date1. A result is a timedelta object.


2 Answers

You have to subtract same types - datetimes with datetime (with zero times) or dates with date.

Use Timestamp.now with Timestamp.normalize or Timestamp.floor for remove times:

df['diff'] = pd.to_datetime( df['date']) - pd.Timestamp.now().normalize() 

df['diff'] = pd.to_datetime( df['date']) - pd.Timestamp.now().floor('d')

You can also use replace:

dt = pd.datetime.now().replace(hour=0, minute=0, second=0, microsecond=0)
df['diff'] = pd.to_datetime( df['date']) - dt

Or convert Datetimes to dates for subtract same types:

dt = datetime.datetime.now().date()
df['diff'] = pd.to_datetime(df['date']).dt.date - dt

Sample:

rng = pd.date_range('2018-04-03', periods=10, freq='100D')
df = pd.DataFrame({'date': rng}) 

df['diff'] = pd.to_datetime( df['date']) - pd.Timestamp.now().normalize() 
print (df)
        date      diff
0 2018-04-03 -261 days
1 2018-07-12 -161 days
2 2018-10-20  -61 days
3 2019-01-28   39 days
4 2019-05-08  139 days
5 2019-08-16  239 days
6 2019-11-24  339 days
7 2020-03-03  439 days
8 2020-06-11  539 days
9 2020-09-19  639 days
like image 102
jezrael Avatar answered Oct 17 '22 16:10

jezrael


There is a subtle but important distinction. Pandas supports datetime.datetime objects but does not support datetime.date objects:

from datetime import date, datetime

# TypeError: unsupported operand type(s) for -: 'DatetimeIndex' and 'datetime.date'
df['date'] - date.today()

# works correctly
df['date'] - datetime.now()

# works correctly
df['date'] - datetime.now().replace(minute=0, hour=0, second=0, microsecond=0)

Note pd.Timestamp.date returns a datetime.date object. The docs do specify this: Return date object with same year, month and day. That date object is not supported natively by Pandas in the same way datetime objects are supported.

But replacing time values is cumbersome. You will likely prefer using in-built Pandas methods for your calculations. These are all equivalent:

df['date'] - pd.Timestamp('today').floor('D')
df['date'] - pd.Timestamp.today().normalize()
df['date'] - pd.to_datetime('today').normalize()
like image 30
jpp Avatar answered Oct 17 '22 16:10

jpp