Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dates to Durations in Pandas

I feel like this should be done very easily, yet I can't figure out how. I have a pandas DataFrame with column date:

0    2012-08-21
1    2013-02-17
2    2013-02-18
3    2013-03-03
4    2013-03-04
Name: date, dtype: datetime64[ns]

I want to have a columns of durations, something like:

0    0
1    80 days
2    1 day
3    15 days
4    1 day
Name: date, dtype: datetime64[ns]

My attempt yields bunch of 0 days and NaT instead:

>>> df.date[1:] - df.date[:-1]
0       NaT
1    0 days
2    0 days
...

Any ideas?

like image 478
alexsalo Avatar asked Dec 24 '15 01:12

alexsalo


2 Answers

Timedeltas are useful here: (see docs)

Starting in v0.15.0, we introduce a new scalar type Timedelta, which is a subclass of datetime.timedelta, and behaves in a similar manner, but allows compatibility with np.timedelta64 types as well as a host of custom representation, parsing, and attributes.

Timedeltas are differences in times, expressed in difference units, e.g. days, hours, minutes, seconds. They can be both positive and negative.

df

           0
0 2012-08-21
1 2013-02-17
2 2013-02-18
3 2013-03-03
4 2013-03-04

You could:

pd.to_timedelta(df)

TimedeltaIndex(['0 days'], dtype='timedelta64[ns]', freq=None)
0      0
1    180
2      1
3     13
4      1
Name: 0, dtype: int64

Alternatively, you can calculate the difference between points in time using .shift() (or .diff() as illustrated by @Andy Hayden):

res = df-df.shift()

to get:

res.fillna(0)

         0
0   0 days
1 180 days
2   1 days
3  13 days
4   1 days

You can convert these from timedelta64 dtype to integer using:

res.fillna(0).squeeze().dt.days

0      0
1    180
2      1
3     13
4      1
like image 81
Stefan Avatar answered Sep 29 '22 09:09

Stefan


You can use diff:

In [11]: s
Out[11]:
0   2012-08-21
1   2013-02-17
2   2013-02-18
3   2013-03-03
4   2013-03-04
Name: date, dtype: datetime64[ns]

In [12]: s.diff()
Out[12]:
0        NaT
1   180 days
2     1 days
3    13 days
4     1 days
Name: date, dtype: timedelta64[ns]

In [13]: s.diff().fillna(0)
Out[13]:
0     0 days
1   180 days
2     1 days
3    13 days
4     1 days
Name: date, dtype: timedelta64[ns]
like image 42
Andy Hayden Avatar answered Sep 29 '22 08:09

Andy Hayden