I am attempting to calculate the difference in days between todays and a pandas data consisting of historical data. Below is the intended code:
df['diff'] = pd.to_datetime( df['date']) - pd.datetime.now().date()
However, it produces the following error:
TypeError: unsupported operand type(s) for -: 'DatetimeIndex' and 'datetime.date'
The date column in the pandas table looks like this:
0 2018-12-18
1 2018-12-18
2 2018-12-18
3 2018-12-18
4 2018-12-18
How do I fix this error. Thanks in advance.
When the function receives the date string it will first use the Pandas to_datetime() function to convert it to a Python datetime and it will then use the timedelta() function to subtract the number of days defined in the days variable.
You can subtract a day from a python date using the timedelta object. You need to create a timedelta object with the amount of time you want to subtract. Then subtract it from the date.
first, calculate the difference between the two dates. second, convert the difference in the metric you want to use… 'D' for day, 'W' for weeks, 'M' for month, 'Y' for year.
Use the strptime(date_str, format) function to convert a date string into a datetime object as per the corresponding format . To get the difference between two dates, subtract date2 from date1. A result is a timedelta object.
You have to subtract same types - datetimes with datetime (with zero times) or dates with date.
Use Timestamp.now
with Timestamp.normalize
or Timestamp.floor
for remove time
s:
df['diff'] = pd.to_datetime( df['date']) - pd.Timestamp.now().normalize()
df['diff'] = pd.to_datetime( df['date']) - pd.Timestamp.now().floor('d')
You can also use replace
:
dt = pd.datetime.now().replace(hour=0, minute=0, second=0, microsecond=0)
df['diff'] = pd.to_datetime( df['date']) - dt
Or convert Datetimes
to date
s for subtract same types:
dt = datetime.datetime.now().date()
df['diff'] = pd.to_datetime(df['date']).dt.date - dt
Sample:
rng = pd.date_range('2018-04-03', periods=10, freq='100D')
df = pd.DataFrame({'date': rng})
df['diff'] = pd.to_datetime( df['date']) - pd.Timestamp.now().normalize()
print (df)
date diff
0 2018-04-03 -261 days
1 2018-07-12 -161 days
2 2018-10-20 -61 days
3 2019-01-28 39 days
4 2019-05-08 139 days
5 2019-08-16 239 days
6 2019-11-24 339 days
7 2020-03-03 439 days
8 2020-06-11 539 days
9 2020-09-19 639 days
There is a subtle but important distinction. Pandas supports datetime.datetime
objects but does not support datetime.date
objects:
from datetime import date, datetime
# TypeError: unsupported operand type(s) for -: 'DatetimeIndex' and 'datetime.date'
df['date'] - date.today()
# works correctly
df['date'] - datetime.now()
# works correctly
df['date'] - datetime.now().replace(minute=0, hour=0, second=0, microsecond=0)
Note pd.Timestamp.date
returns a datetime.date
object. The docs do specify this: Return date object with same year, month and day
. That date
object is not supported natively by Pandas in the same way datetime
objects are supported.
But replacing time values is cumbersome. You will likely prefer using in-built Pandas methods for your calculations. These are all equivalent:
df['date'] - pd.Timestamp('today').floor('D')
df['date'] - pd.Timestamp.today().normalize()
df['date'] - pd.to_datetime('today').normalize()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With