I have a pandas dataframe as follows
Dev_id Time
88345 13:40:31
87556 13:20:33
88955 13:05:00
..... ........
85678 12:15:28
The above dataframe has 83000 rows. I want to take time difference between two consecutive rows and keep it in a separate column. The desired result would be
Dev_id Time Time_diff(in min)
88345 13:40:31 20
87556 13:20:33 15
88955 13:05:00 15
I have tried df['Time_diff'] = df['Time'].diff(-1)
but getting error as shown below
TypeError: unsupported operand type(s) for -: 'datetime.time' and 'datetime.time'
How to solve this
Difference between rows or columns of a pandas DataFrame object is found using the diff() method. The axis parameter decides whether difference to be calculated is between rows or between columns. When the periods parameter assumes positive values, difference is found by subtracting the previous row from the next row.
To get the difference between two-time, subtract time1 from time2. A result is a timedelta object. The timedelta represents a duration which is the difference between two-time to the microsecond resolution. To get a time difference in seconds, use the timedelta.
Example #1: Use subtract() function to subtract each element of a dataframe with a corresponding element in a series.
By using apply and specifying one as the axis, we can run a function on every row of a dataframe. This solution also uses looping to get the job done, but apply has been optimized better than iterrows , which results in faster runtimes.
Problem is pandas
need datetime
s or timedelta
s for diff
function, so first converting by to_timedelta
, then get total_seconds
and divide by 60
:
df['Time_diff'] = pd.to_timedelta(df['Time'].astype(str)).diff(-1).dt.total_seconds().div(60)
#alternative
#df['Time_diff'] = pd.to_datetime(df['Time'].astype(str)).diff(-1).dt.total_seconds().div(60)
print (df)
Dev_id Time Time_diff
0 88345 13:40:31 19.966667
1 87556 13:20:33 15.550000
2 88955 13:05:00 49.533333
3 85678 12:15:28 NaN
If want floor
or round
per minutes:
df['Time_diff'] = (pd.to_timedelta(df['Time'].astype(str))
.diff(-1)
.dt.floor('T')
.dt.total_seconds()
.div(60))
print (df)
Dev_id Time Time_diff
0 88345 13:40:31 19.0
1 87556 13:20:33 15.0
2 88955 13:05:00 49.0
3 85678 12:15:28 NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With