Efficient Shift Scheduling in Python

Question

I'm currently working on doing some shift scheduling simulations for a model taxicab company. The company operates 350 cabs, and all are in use on any given day. Drivers each work 5 shifts of 12 hours each, and the there are four overlapping shifts a day. There are shifts from 3:00-15:00, 15:00-3:00, 16:00-4:00, and 4:00-16:00. I developed it in Python originally, because of the need to rapidly develop it, and I thought that the performance would be acceptable. The original parameters only required two shifts a day (3:00-15:00, and 15:00-3:00), and while performance was not great, it was good enough for my uses. It could make a weekly schedule for the drivers in about 8 minutes, using a simple brute force algorithm (evaluates all potential swaps to see if the situation can be improved.)

With the four overlapping shifts, performance is absolutely abysmal. It takes a little over an hour to do a weekly schedule. I've done some profiling using cProfile, and it looks like the main culprits are two methods. One is a method to determine if there is a conflict when placing a driver in a shift. It makes sure that they are not serving in a shift on the same day, or serving in the preceding or following shifts. With only two shifts a day, this was easy. One simply had to determine if the driver was already scheduled to work in the shift directly before or after. With the four overlapping shifts, this has become more complicated. The second culprit is the method which determines whether the shift is a day or night shift. Again, with the original two shifts, this was as easy as determining if the shift number was even or odd, with shift numbers beginning at 0. The first shift (shift 0) was designated as a night shift, the next was day, and so on and so forth. Now the first two are night, the next two are, etc. These methods call each other, so I will put their bodies below.

def conflict_exists(shift, driver, shift_data):
    next_type = get_stype((shift+1) % 28)
    my_type = get_stype(shift)

    nudge = abs(next_type - my_type)

    if driver in shift_data[shift-2-nudge] or driver in shift_data[shift-1-nudge] or driver in shift_data[(shift+1-(nudge*2)) % 28] or driver in shift_data[(shift+2-nudge) % 28] or driver in shift_data[(shift+3-nudge) % 28]:
        return True
    else:
        return False

Note that get_stype returns the type of the shift, with 0 indicating it is a night shift and 1 indicating it a day shift.

In order to determine the shift type, I'm using this method:

def get_stype(k):
    if (k / 4.0) % 1.0 < 0.5:
        return 0
    else:
        return 1

And here's the relevant output from cProfile:

     ncalls  tottime  percall  cumtime  percall
     57662556   19.717    0.000   19.717    0.000 sim8.py:241(get_stype)
     28065503   55.650    0.000   77.591    0.000 sim8.py:247(in_conflict)

Does anyone have any sagely advice or tips on how I might go about improving the performance of this script? Any help would be greatly appreciated!

Cheers,

Tim

EDIT: Sorry, I should have clarified that the data from each shift is stored as a set i.e. shift_data[k] is of the set data type.

EDIT 2:

Adding main loop, as per request below, along with other methods called. It's a bit of a mess, and I apologize for that.

def optimize_schedule(shift_data, driver_shifts, recheck):
    skip = set()

    if len(recheck) == 0:
        first_run = True
        recheck = []
        for i in xrange(28):
            recheck.append(set())
    else:
        first_run = False

    for i in xrange(28):

        if (first_run):
            targets = shift_data[i]
        else:
            targets = recheck[i]

        for j in targets:
            o_score = eval_score = opt_eval_at_coord(shift_data, driver_shifts, i, j)

            my_type = get_stype(i)
            s_type_fwd = get_stype((i+1) % 28)

            if (my_type == s_type_fwd):
                search_direction = (i + 2) % 28
                end_direction = i
            else:
                search_direction = (i + 1) % 28 
                end_direction = (i - 1) % 28 

            while True:
                if (end_direction == search_direction):
                    break
                for k in shift_data[search_direction]:

                    coord = search_direction * 10000 + k 

                    if coord in skip:
                        continue

                    if k in shift_data[i] or j in shift_data[search_direction]:
                        continue

                    if in_conflict(search_direction, j, shift_data) or in_conflict(i, k, shift_data):
                        continue

                    node_a_prev_score = o_score
                    node_b_prev_score = opt_eval_at_coord(shift_data, driver_shifts, search_direction, k)

                    if (node_a_prev_score == 1) and (node_b_prev_score == 1):
                        continue

                    a_type = my_type
                    b_type = get_stype(search_direction)

                    if (node_a_prev_score == 1):
                        if (driver_shifts[j]['type'] == 'any') and (a_type != b_type):
                            test_eval = 2
                        else:
                            continue
                    elif (node_b_prev_score == 1):
                        if (driver_shifts[k]['type'] == 'any') and (a_type != b_type):
                            test_eval = 2
                        else:
                            test_eval = 0
                    else:
                        if (a_type == b_type):
                            test_eval = 0
                        else:
                            test_eval = 2

                    print 'eval_score: %f' % test_eval

                    if (test_eval > eval_score):

                        cand_coords = [search_direction, k]
                        eval_score = test_eval
                        if (test_eval == 2.0):
                            break
                else:
                    search_direction = (search_direction + 1) % 28
                    continue

                break

            if (eval_score > o_score):
                print 'doing a swap: ',
                print cand_coords,

                shift_data[i].remove(j)
                shift_data[i].add(cand_coords[1])

                shift_data[cand_coords[0]].add(j)   
                shift_data[cand_coords[0]].remove(cand_coords[1])

                if j in recheck[i]:
                    recheck[i].remove(j)

                if cand_coords[1] in recheck[cand_coords[0]]:               
                    recheck[cand_coords[0]].remove(cand_coords[1])

                recheck[cand_coords[0]].add(j)
                recheck[i].add(cand_coords[1])

            else:
                coord = i * 10000 + j
                skip.add(coord)

    if first_run:
        shift_data = optimize_schedule(shift_data, driver_shifts, recheck)

    return shift_data



def opt_eval_at_coord(shift_data, driver_shifts, i, j):
    node = j
    if in_conflict(i, node, shift_data):
        return float('-inf')
    else:
        s_type = get_stype(i)

        d_pref = driver_shifts[node]['type']

        if (s_type == 0 and d_pref == 'night') or (s_type == 1 and d_pref == 'day') or (d_pref == 'any'):
            return 1
        else:
            return 0

Thomas K · Accepted Answer

There's nothing that would obviously slow these functions down, and indeed they aren't slow. They just get called a lot. You say you're using a brute force algorithm - can you write an algorithm that doesn't try every possible combination? Or is there a more efficient way of doing it, like storing the data by driver rather than by shift?

Of course, if you need instant speedups, it might benefit from running in an interpreter like PyPy, or using Cython to convert critical parts to C.

Efficient Shift Scheduling in Python

Tags:

python

tabdulla

1 Answers

Thomas K

Recent Activity

Donate For Us

Efficient Shift Scheduling in Python

Tags:

python

tabdulla

1 Answers

Thomas K

Related questions

Recent Activity

Donate For Us