Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculate difference between 'times' rows in DataFrame Pandas

My DataFrame is in the Form:

       TimeWeek   TimeSat  TimeHoli
0      6:40:00   8:00:00   8:00:00
1      6:45:00   8:05:00   8:05:00
2      6:50:00   8:09:00   8:10:00
3      6:55:00   8:11:00   8:14:00
4      6:58:00   8:13:00   8:17:00
5      7:40:00   8:15:00   8:21:00

I need to find the time difference between each row in TimeWeek , TimeSat and TimeHoli, the output must be

TimeWeekDiff   TimeSatDiff  TimeHoliDiff
00:05:00          00:05:00       00:05:00
00:05:00          00:04:00       00:05:00
00:05:00          00:02:00       00:04:00  
00:03:00          00:02:00       00:03:00
00:02:00          00:02:00       00:04:00 

I tried using (d['TimeWeek']-df['TimeWeek'].shift().fillna(0) , it throws an error:

TypeError: unsupported operand type(s) for -: 'str' and 'str'

Probably because of the presence of ':' in the column. How do I resolve this?

like image 230
Pragnya Srinivasan Avatar asked Apr 11 '15 02:04

Pragnya Srinivasan


1 Answers

It looks like the error is thrown because the data is in the form of a string instead of a timestamp. First convert them to timestamps:

df2 = df.apply(lambda x: [pd.Timestamp(ts) for ts in x])

They will contain today's date by default, but this shouldn't matter once you difference the time (hopefully you don't have to worry about differencing 23:55 and 00:05 across dates).

Once converted, simply difference the DataFrame:

>>> df2 - df2.shift()
   TimeWeek  TimeSat  TimeHoli
0       NaT      NaT       NaT
1  00:05:00 00:05:00  00:05:00
2  00:05:00 00:04:00  00:05:00
3  00:05:00 00:02:00  00:04:00
4  00:03:00 00:02:00  00:03:00
5  00:42:00 00:02:00  00:04:00

Depending on your needs, you can just take rows 1+ (ignoring the NaTs):

(df2 - df2.shift()).iloc[1:, :]

or you can fill the NaTs with zeros:

(df2 - df2.shift()).fillna(0)
like image 174
Alexander Avatar answered Sep 21 '22 20:09

Alexander