Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to calculate time between events in a pandas

Original Question

I'm stuck on the following problem. I'm trying to figure out at which moments in time and for how long a vehicle is situated at the factory. I have an excel sheet in which all events are stored which are either delivery routes or maintenance events. The ultimate goal is to obtain a dataframe in which the vehicle registration number is given with the corresponding arrival at the factory and the time spend there(including maintenance actions). For people interested, this is because I ultimately want to be able to schedule non-critical maintenance actions on the vehicles.

An example of my dataframe would be:

  Registration RoutID       Date Dep Loc Arr Loc Dep Time Arr Time  Days
0         XC66    A58  20/May/17    Home   Loc A    10:54    21:56     0
1         XC66    A59  21/May/17   Loc A    Home    00:12    10:36     0
2         XC66   A345  21/May/17   Home    Loc B    12:41    19:16     0
3         XC66   A346  21/May/17   Loc B   Loc C    20:50    03:49     1
4         XC66   A347  22/May/17   Loc C    Home    06:10    07:40     0
5         XC66    #M1  22/May/17    Home    Home    10:51    13:00     0

I have created a script in which the dates and times are all processed to create the correct datetime columns for the arrival and departure datetimes. For the maintenance periods: "Dep Loc" = Home and "Arr Loc" = Home the following code is used to single out the relevant lines:

df_home = df[df["Dep Loc"].isin(["Home"])]
df_home = df_home[df_home["Arr Loc"].isin(["Home"])]

From here I can easily subtract the dates to create the duration column.

So far so good. However, I'm stuck on using calculating the other times. This because there might be intermediate stops, so the .shift() function does not work as the amount of rows to shift by is not-constant.

I have tried to search on this matter but I could only find shift solutions, or answers that are based in the internal event times, but not on the time between events.

Any guidance in the right direction would be greatly appreciated!

Regards

Attempt of the Solution

I have been stuck on this question for a while now, but shortly after posting this question I tried this solution:

for idx, loc in enumerate(df["Arr Loc"]):
    if loc == "Home":
        a = ((idx2, obj) for idx2, obj in enumerate(df["Dep Loc"]) if (obj == "Home" and idx2 > idx))
        idx_next = next(a)
        idx_next = idx_next[0]

        Arrival_times = df["Arr Time"]
        Departure_times = df["Dep Time"]

        Duration = Arrival_times[idx] - Departure_times[idx_next]

Here I used the next function to find the next occurrence of Home as the starting location(i.e. the time the vehicle leaves the base). Subsequently I subtract the two dates to find the proper time difference.

It works for the small data set, but not still for the entire dataset.

like image 302
jeff Avatar asked Jul 25 '17 15:07

jeff


People also ask

How do you calculate time difference between data frames?

To calculate the time difference between the two dates in seconds we can divide the total_seconds() value by 60 to obtain the minutes, then divide by 60 again to obtain the time difference in hours. We can then assign the time difference in hours to a new column in the dataframe.

How are pandas series lengths calculated?

len() method is used to determine length of each string in a Pandas series.


1 Answers

After filtering the relevant data rows, convert the "Arr time" & "Dep time" to timestamps according to the "Date" & "Days" columns

df_home = df[df["Dep Loc"].isin(["Home"])]
df_home = df_home[df_home["Arr Loc"].isin(["Home"])]

df_home['Dep Time']=df_home['Date']+' '+df_home['Dep Time'] 

df_home['Arr Time']=df_home['Date']+' '+df_home['Arr Time'] 

df_home['Date']=pd.to_datetime(df_home['Date'])

df_home['Dep Time']=pd.to_datetime(df_home['Dep Time'])
df_home['Arr Time']=pd.to_datetime(df_home['Arr Time'])
df_home['Dep Time']=pd.to_datetime(df_home['Dep Time'])+pd.to_timedelta(df_home['Days'], unit='d')

Finally, difference between "Dep time" & "Arr time" would give the time duration(in minutes)

df_home['diff_duration']=(df_home['Dep Time']-df_home['Arr Time']).astype('timedelta64[m]')
like image 76
sai kumar Avatar answered Sep 17 '22 15:09

sai kumar