Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Implementing a dynamic multiple timeline queue

Introduction

I would like to implement a dynamic multiple timeline queue. The context here is scheduling in general.

What is a timeline queue?

This is still simple: It is a timeline of tasks, where each event has its start and end time. Tasks are grouped as jobs. This group of tasks need to preserve its order, but can be moved around in time as a whole. For example it could be represented as:

 --t1--   ---t2.1-----------t2.2-------
 '    '   '        '                  '
20    30  40       70                120 

I would implement this as a heap queue with some additional constraints. The Python sched module has some basic approaches in this direction.

Definition multiple timeline queue

One queue stands for a resource and a resource is needed by a task. Graphical example:

R1  --t1.1----- --t2.2-----      -----t1.3--    
            /  \                /
R2  --t2.1--     ------t1.2-----


Explaining "dynamic"

It becomes interesting when a task can use one of multiple resources. An additional constraint is that consecutive tasks, which can run on the same resource, must use the same resource.

Example: If (from above) task t1.3 can run on R1 or R2, the queue should look like:

R1  --t1.1----- --t2.2-----      
            /  \                
R2  --t2.1--     ------t1.2----------t1.3--    


Functionality (in priority order)

  • FirstFreeSlot(duration, start): Find the first free time slot beginning from start where there is free time for duration (see detailed explanation at the end).
  • Enqueue a job as earliest as possible on the multiple resources by regarding the constraints (mainly: correct order of tasks, consecutive tasks on same resource) and using FirstFreeSlot.
  • Put a job at a specific time and move the tail backwards
  • Delete a job
  • Recalculate: After delete, test if some tasks can be executed earlier.


Key Question

The point is: How can I represent this information to provide the functionality efficiently? Implementation is up to me ;-)

Update: A further point to consider: The typical interval structures have the focus on "What is at point X?" But in this case the enqueue and therefore the question "Where is the first empty slot for duration D?" is much more important. So a segment/interval tree or something else in this direction is probably not the right choice.

To elaborate the point with the free slots further: Due to the fact that we have multiple resources and the constraint of grouped tasks there can be free time slots on some resources. Simple example: t1.1 run on R1 for 40 and then t1.2 run on R2. So there is an empty interval of [0, 40] on R2 which can be filled by the next job.


Update 2: There is an interesting proposal in another SO question. If someone can port it to my problem and show that it is working for this case (especially elaborated to multiple resources), this would be probably a valid answer.

like image 427
schlamar Avatar asked Jun 22 '12 09:06

schlamar


2 Answers

Let's restrict ourselves to the simplest case first: Find a suitable data structure that allows for a fast implementation of FirstFreeSlot().

The free time slots live in a two-dimensional space: One dimension is the start time s, the other is the length d. FirstFreeSlot(D) effectively answers the following query:

min s: d >= D

If we think of s and d as a cartesian space (d=x, s=y), this means finding the lowest point in a subplane bounded by a vertical line. A quad-tree, perhaps with some auxiliary information in each node (namely, min s over all leafs), will help answering this query efficiently.

For Enqueue() in the face of resource constraints, consider maintaining a separate quad-tree for each resource. The quad tree can also answer queries like

min s: s >= S & d >= D

(required for restricting the start data) in a similar fashion: Now a rectangle (open at the top left) is cut off, and we look for min s in that rectangle.

Put() and Delete() are simple update operations for the quad-tree.

Recalculate() can be implemented by Delete() + Put(). In order to save time for unnecessary operations, define sufficient (or, ideally, sufficient + necessary) conditions for triggering a recalculation. The Observer pattern might help here, but remember putting the tasks for rescheduling into a FIFO queue or a priority queue sorted by start time. (You want to finish rescheduling the current task before taking over to the next.)

On a more general note, I'm sure you are aware that most kind of scheduling problems, especially those with resource constraints, are NP-complete at least. So don't expect an algorithm with a decent runtime in the general case.

like image 67
krlmlr Avatar answered Nov 10 '22 01:11

krlmlr


class Task:
    name=''
    duration=0
    resources=list()

class Job:
    name=''
    tasks=list()

class Assignment:
    task=None
    resource=None
    time=None

class MultipleTimeline:
    assignments=list()
    def enqueue(self,job):
        pass
    def put(self,job):
        pass
    def delete(self,job):
        pass
    def recalculate(self):
        pass

Is this a first step in the direction you are looking for, i.e. a data model written out in Python?

Update:

Hereby my more efficient model:

It basicly puts all Tasks in a linked list ordered by endtime.

class Task:
    name=''
    duration=0    # the amount of work to be done
    resources=0   # bitmap that tells what resources this task uses
# the following variables are only used when the task is scheduled
    next=None     # the next scheduled task by endtime
    resource=None # the resource this task is scheduled
    gap=None      # the amount of time before the next scheduled task starts on this resource

class Job:
    id=0
    tasks=list() # the Task instances of this job in order 

class Resource:
    bitflag=0       # a bit flag which operates bitwisely with Task.resources
    firsttask=None  # the first Task instance that is scheduled on this resource
    gap=None        # the amount of time before the first Task starts

class MultipleTimeline:
    resources=list()
    def FirstFreeSlot():
            pass
    def enqueue(self,job):
        pass
    def put(self,job):
        pass
    def delete(self,job):
        pass
    def recalculate(self):
        pass

Because of the updates by enqueue and put I decided not to use trees. Because of put which moves tasks in time I decided not to use absolute times.

FirstFreeSlot not only returns the task with the free slot but also the other running tasks with their endtimes.

enqueue works as follows: We look for a free slot by FirstFreeSlot and schedule the task here. If there is enough space for the next task we can schedule it in too. If not: look at the other tasks running if they have free space. If not: run FirstFreeSlot with parameters of this time and running tasks.

improvements: if put is not used very often and enqueue is done from time zero we could keep track of the overlapping tasks by including a dict() per tasks that contains the other running tasks. Then it is also easy to keep a list() per Resource which contains the scheduled tasks with absolute time for this Resource ordered by endtime. Only those tasks are included that have bigger timegaps than before. Now we can easier find a free slot.

Questions: Do Tasks scheduled by put need to be executed at that time? If yes: What if another task to be scheduled by put overlaps? Do all resources execute a task as fast?

like image 2
Marco de Wit Avatar answered Nov 10 '22 01:11

Marco de Wit