I feel like this should be done very easily, yet I can't figure out how. I have a pandas
DataFrame
with column date:
0 2012-08-21
1 2013-02-17
2 2013-02-18
3 2013-03-03
4 2013-03-04
Name: date, dtype: datetime64[ns]
I want to have a columns of durations, something like:
0 0
1 80 days
2 1 day
3 15 days
4 1 day
Name: date, dtype: datetime64[ns]
My attempt yields bunch of 0 days and NaT
instead:
>>> df.date[1:] - df.date[:-1]
0 NaT
1 0 days
2 0 days
...
Any ideas?
Timedeltas
are useful here: (see docs)
Starting in v0.15.0, we introduce a new scalar type Timedelta, which is a subclass of datetime.timedelta, and behaves in a similar manner, but allows compatibility with np.timedelta64 types as well as a host of custom representation, parsing, and attributes.
Timedeltas are differences in times, expressed in difference units, e.g. days, hours, minutes, seconds. They can be both positive and negative.
df
0
0 2012-08-21
1 2013-02-17
2 2013-02-18
3 2013-03-03
4 2013-03-04
You could:
pd.to_timedelta(df)
TimedeltaIndex(['0 days'], dtype='timedelta64[ns]', freq=None)
0 0
1 180
2 1
3 13
4 1
Name: 0, dtype: int64
Alternatively, you can calculate the difference between points in time using .shift()
(or .diff()
as illustrated by @Andy Hayden):
res = df-df.shift()
to get:
res.fillna(0)
0
0 0 days
1 180 days
2 1 days
3 13 days
4 1 days
You can convert these from timedelta64
dtype
to integer
using:
res.fillna(0).squeeze().dt.days
0 0
1 180
2 1
3 13
4 1
You can use diff:
In [11]: s
Out[11]:
0 2012-08-21
1 2013-02-17
2 2013-02-18
3 2013-03-03
4 2013-03-04
Name: date, dtype: datetime64[ns]
In [12]: s.diff()
Out[12]:
0 NaT
1 180 days
2 1 days
3 13 days
4 1 days
Name: date, dtype: timedelta64[ns]
In [13]: s.diff().fillna(0)
Out[13]:
0 0 days
1 180 days
2 1 days
3 13 days
4 1 days
Name: date, dtype: timedelta64[ns]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With