I am trying to create a rudimentary scheduling system. Here is what I have so far:
I have a pandas dataframe job_data that looks like this:
| wc | job | start | duration | 
|---|---|---|---|
| 1 | J1 | 2022-08-16 07:30:00 | 17 | 
| 1 | J2 | 2022-08-16 07:30:00 | 5 | 
| 2 | J3 | 2022-08-16 07:30:00 | 21 | 
| 2 | J4 | 2022-08-16 07:30:00 | 12 | 
It contains a wc (work center), job, a start date and duration for the job in hours.
I have created a function add_hours that takes the following arguments: start (datetime), hours (int).
It calculates the when the job will be complete based on the start time and duration.
The code for add_hours is:
def is_in_open_hours(dt):
    return (
        dt.weekday() in business_hours["weekdays"]
        and dt.date() not in holidays
        and business_hours["from"].hour <= dt.time().hour < business_hours["to"].hour
    )
def get_next_open_datetime(dt):
    while True:
        dt = dt + timedelta(days=1)
        if dt.weekday() in business_hours["weekdays"] and dt.date() not in holidays:
            dt = datetime.combine(dt.date(), business_hours["from"])
            return dt
def add_hours(dt, hours):
    while hours != 0:
        if is_in_open_hours(dt):
            dt = dt + timedelta(hours=1)
            hours = hours - 1
        else:
            dt = get_next_open_datetime(dt)
    return dt
The code to calculate the end column is:
df["end"] = df.apply(lambda x: add_hours(x.start, x.duration), axis=1)
The result of function is the end column:
| wc | job | start | duration | end | 
|---|---|---|---|---|
| 1 | J1 | 2022-08-16 07:30:00 | 17 | 2022-08-17 14:00:00 | 
| 1 | J2 | 2022-08-16 07:30:00 | 5 | 2022-08-17 10:00:00 | 
| 2 | J3 | 2022-08-16 07:30:00 | 21 | 2022-08-18 08:00:00 | 
| 2 | J4 | 2022-08-16 07:30:00 | 12 | 2022-08-18 08:00:00 | 
Problem is, I need the start datetime in the second row to be the end datetime from the previous row instead of them all using the same start date. I also need to start this process over for each wc.
So the desired output would be:
| wc | job | start | duration | end | 
|---|---|---|---|---|
| 1 | J1 | 2022-08-16 07:30:00 | 17 | 2022-08-17 14:00:00 | 
| 1 | J2 | 2022-08-17 14:00:00 | 5 | 2022-08-17 19:00:00 | 
| 2 | J3 | 2022-08-16 07:30:00 | 21 | 2022-08-18 08:00:00 | 
| 2 | J4 | 2022-08-18 08:00:00 | 10 | 2022-08-18 18:00:00 | 
You can use Timedelta and groupby operations.
As you did not provide your custom function, I'll apply here a simple addition of the duration:
df['start'] = pd.to_datetime(df['start'])
t = pd.to_timedelta(df['duration'], unit='h')
g = t.groupby(df['wc'])
df['start'] = df['start'].add(g.apply(lambda x: x.cumsum().shift(fill_value=pd.Timedelta('0'))))
df['end'] = df['start'].add(t)
Output:
   wc job               start  duration                 end
0   1  J1 2022-08-16 07:30:00        17 2022-08-17 00:30:00
1   1  J2 2022-08-17 00:30:00         5 2022-08-17 05:30:00
2   2  J3 2022-08-16 07:30:00        21 2022-08-17 04:30:00
3   2  J4 2022-08-17 04:30:00        12 2022-08-17 16:30:00
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With