Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculate time difference between Pandas Dataframe indices

I am trying to add a column of deltaT to a dataframe where deltaT is the time difference between the successive rows (as indexed in the timeseries).

time                 value  2012-03-16 23:50:00      1 2012-03-16 23:56:00      2 2012-03-17 00:08:00      3 2012-03-17 00:10:00      4 2012-03-17 00:12:00      5 2012-03-17 00:20:00      6 2012-03-20 00:43:00      7 

Desired result is something like the following (deltaT units shown in minutes):

time                 value  deltaT  2012-03-16 23:50:00      1       0 2012-03-16 23:56:00      2       6 2012-03-17 00:08:00      3      12 2012-03-17 00:10:00      4       2 2012-03-17 00:12:00      5       2 2012-03-17 00:20:00      6       8 2012-03-20 00:43:00      7      23 
like image 949
ghpguru Avatar asked May 27 '13 17:05

ghpguru


People also ask

How do you get time difference in Pandas?

There are several ways to calculate the time difference between two dates in Python using Pandas. The first is to subtract one date from the other. This returns a timedelta such as 0 days 05:00:00 that tells us the number of days, hours, minutes, and seconds between the two dates.

How do you find the difference between two dates in a DataFrame in Python?

Use df. dates1-df. dates2 to find the difference between the two dates and then convert the result in the form of months.

How do you find the difference between two rows in a data frame?

Difference between rows or columns of a pandas DataFrame object is found using the diff() method. The axis parameter decides whether difference to be calculated is between rows or between columns. When the periods parameter assumes positive values, difference is found by subtracting the previous row from the next row.

How do you find the difference in time in Python?

To get a time difference in seconds, use the timedelta. total_seconds() methods. Multiply the total seconds by 1000 to get the time difference in milliseconds. Divide the seconds by 60 to get the difference in minutes.


2 Answers

Note this is using numpy >= 1.7, for numpy < 1.7, see the conversion here: http://pandas.pydata.org/pandas-docs/dev/timeseries.html#time-deltas

Your original frame, with a datetime index

In [196]: df Out[196]:                       value 2012-03-16 23:50:00      1 2012-03-16 23:56:00      2 2012-03-17 00:08:00      3 2012-03-17 00:10:00      4 2012-03-17 00:12:00      5 2012-03-17 00:20:00      6 2012-03-20 00:43:00      7  In [199]: df.index Out[199]:  <class 'pandas.tseries.index.DatetimeIndex'> [2012-03-16 23:50:00, ..., 2012-03-20 00:43:00] Length: 7, Freq: None, Timezone: None 

Here is the timedelta64 of what you want

In [200]: df['tvalue'] = df.index  In [201]: df['delta'] = (df['tvalue']-df['tvalue'].shift()).fillna(0)  In [202]: df Out[202]:                       value              tvalue            delta 2012-03-16 23:50:00      1 2012-03-16 23:50:00         00:00:00 2012-03-16 23:56:00      2 2012-03-16 23:56:00         00:06:00 2012-03-17 00:08:00      3 2012-03-17 00:08:00         00:12:00 2012-03-17 00:10:00      4 2012-03-17 00:10:00         00:02:00 2012-03-17 00:12:00      5 2012-03-17 00:12:00         00:02:00 2012-03-17 00:20:00      6 2012-03-17 00:20:00         00:08:00 2012-03-20 00:43:00      7 2012-03-20 00:43:00 3 days, 00:23:00 

Getting out the answer while disregarding the day difference (your last day is 3/20, prior is 3/17), actually is tricky

In [204]: df['ans'] = df['delta'].apply(lambda x: x  / np.timedelta64(1,'m')).astype('int64') % (24*60)  In [205]: df Out[205]:                       value              tvalue            delta  ans 2012-03-16 23:50:00      1 2012-03-16 23:50:00         00:00:00    0 2012-03-16 23:56:00      2 2012-03-16 23:56:00         00:06:00    6 2012-03-17 00:08:00      3 2012-03-17 00:08:00         00:12:00   12 2012-03-17 00:10:00      4 2012-03-17 00:10:00         00:02:00    2 2012-03-17 00:12:00      5 2012-03-17 00:12:00         00:02:00    2 2012-03-17 00:20:00      6 2012-03-17 00:20:00         00:08:00    8 2012-03-20 00:43:00      7 2012-03-20 00:43:00 3 days, 00:23:00   23 
like image 145
Jeff Avatar answered Sep 24 '22 03:09

Jeff


We can create a series with both index and values equal to the index keys using to_series and then compute the differences between successive rows which would result in timedelta64[ns] dtype. After obtaining this, via the .dt property, we could access the seconds attribute of the time portion and finally divide each element by 60 to get it outputted in minutes(optionally filling the first value with 0).

In [13]: df['deltaT'] = df.index.to_series().diff().dt.seconds.div(60, fill_value=0)     ...: df                                 # use .astype(int) to obtain integer values Out[13]:                       value  deltaT time                               2012-03-16 23:50:00      1     0.0 2012-03-16 23:56:00      2     6.0 2012-03-17 00:08:00      3    12.0 2012-03-17 00:10:00      4     2.0 2012-03-17 00:12:00      5     2.0 2012-03-17 00:20:00      6     8.0 2012-03-20 00:43:00      7    23.0 

simplification:

When we perform diff:

In [8]: ser_diff = df.index.to_series().diff()  In [9]: ser_diff Out[9]:  time 2012-03-16 23:50:00               NaT 2012-03-16 23:56:00   0 days 00:06:00 2012-03-17 00:08:00   0 days 00:12:00 2012-03-17 00:10:00   0 days 00:02:00 2012-03-17 00:12:00   0 days 00:02:00 2012-03-17 00:20:00   0 days 00:08:00 2012-03-20 00:43:00   3 days 00:23:00 Name: time, dtype: timedelta64[ns] 

Seconds to minutes conversion:

In [10]: ser_diff.dt.seconds.div(60, fill_value=0) Out[10]:  time 2012-03-16 23:50:00     0.0 2012-03-16 23:56:00     6.0 2012-03-17 00:08:00    12.0 2012-03-17 00:10:00     2.0 2012-03-17 00:12:00     2.0 2012-03-17 00:20:00     8.0 2012-03-20 00:43:00    23.0 Name: time, dtype: float64 

If suppose you want to include even the date portion as it was excluded previously(only time portion was considered), dt.total_seconds would give you the elapsed duration in seconds with which minutes could then be calculated again by division.

In [12]: ser_diff.dt.total_seconds().div(60, fill_value=0) Out[12]:  time 2012-03-16 23:50:00       0.0 2012-03-16 23:56:00       6.0 2012-03-17 00:08:00      12.0 2012-03-17 00:10:00       2.0 2012-03-17 00:12:00       2.0 2012-03-17 00:20:00       8.0 2012-03-20 00:43:00    4343.0    # <-- number of minutes in 3 days 23 minutes Name: time, dtype: float64 
like image 29
Nickil Maveli Avatar answered Sep 24 '22 03:09

Nickil Maveli