I am trying to add a column of deltaT to a dataframe where deltaT is the time difference between the successive rows (as indexed in the timeseries).
time value 2012-03-16 23:50:00 1 2012-03-16 23:56:00 2 2012-03-17 00:08:00 3 2012-03-17 00:10:00 4 2012-03-17 00:12:00 5 2012-03-17 00:20:00 6 2012-03-20 00:43:00 7
Desired result is something like the following (deltaT units shown in minutes):
time value deltaT 2012-03-16 23:50:00 1 0 2012-03-16 23:56:00 2 6 2012-03-17 00:08:00 3 12 2012-03-17 00:10:00 4 2 2012-03-17 00:12:00 5 2 2012-03-17 00:20:00 6 8 2012-03-20 00:43:00 7 23
There are several ways to calculate the time difference between two dates in Python using Pandas. The first is to subtract one date from the other. This returns a timedelta such as 0 days 05:00:00 that tells us the number of days, hours, minutes, and seconds between the two dates.
Use df. dates1-df. dates2 to find the difference between the two dates and then convert the result in the form of months.
Difference between rows or columns of a pandas DataFrame object is found using the diff() method. The axis parameter decides whether difference to be calculated is between rows or between columns. When the periods parameter assumes positive values, difference is found by subtracting the previous row from the next row.
To get a time difference in seconds, use the timedelta. total_seconds() methods. Multiply the total seconds by 1000 to get the time difference in milliseconds. Divide the seconds by 60 to get the difference in minutes.
Note this is using numpy >= 1.7, for numpy < 1.7, see the conversion here: http://pandas.pydata.org/pandas-docs/dev/timeseries.html#time-deltas
Your original frame, with a datetime index
In [196]: df Out[196]: value 2012-03-16 23:50:00 1 2012-03-16 23:56:00 2 2012-03-17 00:08:00 3 2012-03-17 00:10:00 4 2012-03-17 00:12:00 5 2012-03-17 00:20:00 6 2012-03-20 00:43:00 7 In [199]: df.index Out[199]: <class 'pandas.tseries.index.DatetimeIndex'> [2012-03-16 23:50:00, ..., 2012-03-20 00:43:00] Length: 7, Freq: None, Timezone: None
Here is the timedelta64 of what you want
In [200]: df['tvalue'] = df.index In [201]: df['delta'] = (df['tvalue']-df['tvalue'].shift()).fillna(0) In [202]: df Out[202]: value tvalue delta 2012-03-16 23:50:00 1 2012-03-16 23:50:00 00:00:00 2012-03-16 23:56:00 2 2012-03-16 23:56:00 00:06:00 2012-03-17 00:08:00 3 2012-03-17 00:08:00 00:12:00 2012-03-17 00:10:00 4 2012-03-17 00:10:00 00:02:00 2012-03-17 00:12:00 5 2012-03-17 00:12:00 00:02:00 2012-03-17 00:20:00 6 2012-03-17 00:20:00 00:08:00 2012-03-20 00:43:00 7 2012-03-20 00:43:00 3 days, 00:23:00
Getting out the answer while disregarding the day difference (your last day is 3/20, prior is 3/17), actually is tricky
In [204]: df['ans'] = df['delta'].apply(lambda x: x / np.timedelta64(1,'m')).astype('int64') % (24*60) In [205]: df Out[205]: value tvalue delta ans 2012-03-16 23:50:00 1 2012-03-16 23:50:00 00:00:00 0 2012-03-16 23:56:00 2 2012-03-16 23:56:00 00:06:00 6 2012-03-17 00:08:00 3 2012-03-17 00:08:00 00:12:00 12 2012-03-17 00:10:00 4 2012-03-17 00:10:00 00:02:00 2 2012-03-17 00:12:00 5 2012-03-17 00:12:00 00:02:00 2 2012-03-17 00:20:00 6 2012-03-17 00:20:00 00:08:00 8 2012-03-20 00:43:00 7 2012-03-20 00:43:00 3 days, 00:23:00 23
We can create a series with both index and values equal to the index keys using to_series
and then compute the differences between successive rows which would result in timedelta64[ns]
dtype. After obtaining this, via the .dt
property, we could access the seconds attribute of the time portion and finally divide each element by 60 to get it outputted in minutes(optionally filling the first value with 0).
In [13]: df['deltaT'] = df.index.to_series().diff().dt.seconds.div(60, fill_value=0) ...: df # use .astype(int) to obtain integer values Out[13]: value deltaT time 2012-03-16 23:50:00 1 0.0 2012-03-16 23:56:00 2 6.0 2012-03-17 00:08:00 3 12.0 2012-03-17 00:10:00 4 2.0 2012-03-17 00:12:00 5 2.0 2012-03-17 00:20:00 6 8.0 2012-03-20 00:43:00 7 23.0
simplification:
When we perform diff
:
In [8]: ser_diff = df.index.to_series().diff() In [9]: ser_diff Out[9]: time 2012-03-16 23:50:00 NaT 2012-03-16 23:56:00 0 days 00:06:00 2012-03-17 00:08:00 0 days 00:12:00 2012-03-17 00:10:00 0 days 00:02:00 2012-03-17 00:12:00 0 days 00:02:00 2012-03-17 00:20:00 0 days 00:08:00 2012-03-20 00:43:00 3 days 00:23:00 Name: time, dtype: timedelta64[ns]
Seconds to minutes conversion:
In [10]: ser_diff.dt.seconds.div(60, fill_value=0) Out[10]: time 2012-03-16 23:50:00 0.0 2012-03-16 23:56:00 6.0 2012-03-17 00:08:00 12.0 2012-03-17 00:10:00 2.0 2012-03-17 00:12:00 2.0 2012-03-17 00:20:00 8.0 2012-03-20 00:43:00 23.0 Name: time, dtype: float64
If suppose you want to include even the date
portion as it was excluded previously(only time portion was considered), dt.total_seconds
would give you the elapsed duration in seconds with which minutes could then be calculated again by division.
In [12]: ser_diff.dt.total_seconds().div(60, fill_value=0) Out[12]: time 2012-03-16 23:50:00 0.0 2012-03-16 23:56:00 6.0 2012-03-17 00:08:00 12.0 2012-03-17 00:10:00 2.0 2012-03-17 00:12:00 2.0 2012-03-17 00:20:00 8.0 2012-03-20 00:43:00 4343.0 # <-- number of minutes in 3 days 23 minutes Name: time, dtype: float64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With