Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

calculate the time difference between two consecutive rows in pandas

Tags:

python

pandas

I have a pandas dataframe as follows

Dev_id     Time
88345      13:40:31
87556      13:20:33
88955      13:05:00
.....      ........
85678      12:15:28

The above dataframe has 83000 rows. I want to take time difference between two consecutive rows and keep it in a separate column. The desired result would be

Dev_id    Time          Time_diff(in min)
88345      13:40:31      20
87556      13:20:33      15
88955      13:05:00      15

I have tried df['Time_diff'] = df['Time'].diff(-1) but getting error as shown below

TypeError: unsupported operand type(s) for -: 'datetime.time' and 'datetime.time'

How to solve this

like image 716
pythondumb Avatar asked Jan 03 '19 10:01

pythondumb


People also ask

How do you find the difference between rows in Python?

Difference between rows or columns of a pandas DataFrame object is found using the diff() method. The axis parameter decides whether difference to be calculated is between rows or between columns. When the periods parameter assumes positive values, difference is found by subtracting the previous row from the next row.

How do you calculate difference in time in Python?

To get the difference between two-time, subtract time1 from time2. A result is a timedelta object. The timedelta represents a duration which is the difference between two-time to the microsecond resolution. To get a time difference in seconds, use the timedelta.

How do you subtract two rows in a data frame?

Example #1: Use subtract() function to subtract each element of a dataframe with a corresponding element in a series.

Is pandas apply faster than Iterrows?

By using apply and specifying one as the axis, we can run a function on every row of a dataframe. This solution also uses looping to get the job done, but apply has been optimized better than iterrows , which results in faster runtimes.


1 Answers

Problem is pandas need datetimes or timedeltas for diff function, so first converting by to_timedelta, then get total_seconds and divide by 60:

df['Time_diff'] = pd.to_timedelta(df['Time'].astype(str)).diff(-1).dt.total_seconds().div(60)
#alternative
#df['Time_diff'] = pd.to_datetime(df['Time'].astype(str)).diff(-1).dt.total_seconds().div(60)
print (df)
   Dev_id      Time  Time_diff
0   88345  13:40:31  19.966667
1   87556  13:20:33  15.550000
2   88955  13:05:00  49.533333
3   85678  12:15:28        NaN

If want floor or round per minutes:

df['Time_diff'] = (pd.to_timedelta(df['Time'].astype(str))
                     .diff(-1)
                     .dt.floor('T')
                     .dt.total_seconds()
                     .div(60))
print (df)
   Dev_id      Time  Time_diff
0   88345  13:40:31       19.0
1   87556  13:20:33       15.0
2   88955  13:05:00       49.0
3   85678  12:15:28        NaN
like image 87
jezrael Avatar answered Nov 12 '22 21:11

jezrael