I have a Pandas dataframe with 2 columns representing a start-timestamp and an end-timestamp:
start end
2016-06-13 2016-07-20
The datatype of these columns is datetime64[ns]
.
I now want to create a new column showing the difference in months:
start end duration
2016-06-13 2016-07-20 1.1
What I tried is to dow the following:
df['duration'] = df['end'] - df['start']
The result looks like this:
start end duration
2016-06-13 2016-07-20 37 days 00:00:00.000000000
I then tried to do the following:
df['duration'] = df['end'] - df['start']).dt.months
But this yields the following error
AttributeError: 'TimedeltaProperties' object has no attribute 'months'
The datatype of the duration
column is timedelta64[ns]
.
How can I achieve the desired result?
Because NumPy doesn’t have a physical quantities system in its core, the timedelta64 data type was created to complement datetime64. The arguments for timedelta64 are a number, to represent the number of units, and a date/time unit, such as (D)ay, (M)onth, (Y)ear, (h)ours, (m)inutes, or (s)econds.
Use the timedelta to add or subtract weeks, days, hours, minutes, seconds, microseconds, and milliseconds from a given date and time. import the timedelta class from the datetime module and you are ready to use it. Let’s see how to use timedelta class to calculate future dates by adding four weeks to a given date.
As a side note: a timedelta64 doesn't have hours, days, or weeks if its units are months, years, or generic, only if they're weeks or smaller. I assume you've already got timedelta64 values in some relevant unit (usually days, seconds, or milliseconds, or nanoseconds), so this isn't an issue for you, but it's something to be aware of.
There are two Timedelta units (‘Y’, years and ‘M’, months) which are treated specially, because how much time they represent changes depending on when they are used. While a timedelta day unit is equivalent to 24 hours, there is no way to convert a month unit into days, because different months have different numbers of days.
import numpy as np #version: 1.16.2
import pandas as pd #version: 0.25.1
df['duration'] = (df['end'] - df['start'])/np.timedelta64(1, 'M')
The previous code no more works in the recent versions of numpy.
import numpy as np #version: 1.18.5
import pandas as pd #version: 1.1.5
df['duration'] = (df['end'] - df['start']).astype('timedelta64[M]')/np.timedelta64(1, 'M')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With