Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimized algorithm to schedule tasks with dependency?

There are tasks that read from a file, do some processing and write to a file. These tasks are to be scheduled based on the dependency. Also tasks can be run in parallel, so the algorithm needs to be optimized to run dependent tasks in serial and as much as possible in parallel.

eg:

  1. A -> B
  2. A -> C
  3. B -> D
  4. E -> F

So one way to run this would be run 1, 2 & 4 in parallel. Followed by 3.

Another way could be run 1 and then run 2, 3 & 4 in parallel.

Another could be run 1 and 3 in serial, 2 and 4 in parallel.

Any ideas?

like image 237
user2186138 Avatar asked Aug 19 '13 12:08

user2186138


People also ask

How do you set a dependency on a task?

Set Default Dependency TypeClick in the upper-right corner of Zoho Projects and select Portal Configuration in the left panel. Select Task & Timesheet in the left panel and scroll down to find Task Dependency. Select the Default Dependency Type drop-down menu to choose the type of dependency. Click Update.

Which graph algorithm build tools like make used to efficiently determine the order in which build dependencies?

Which of the following standard graph algorithms is used by Make. Explanation: Make can decide the order of building a software using topological sorting. Topological sorting produces the order considering all dependencies provide by makefile.

What are dependencies for tasks?

Task dependencies represent the order in which tasks must be performed. Dependencies enable you to work out the optimal task order, providing the fastest route through the project. There are 4 types of dependency relationships. A task can have multiple successors or multiple predecessors.

What is task scheduling in parallel computing?

Task scheduling in parallel processing is a technique in which processes are assigned to different processors. Task scheduling in parallel processing use different types of algorithms and techniques which are used to reduce the number of delayed jobs.


2 Answers

Let each task (e.g. A,B,...) be nodes in a directed acyclic graph and define the arcs between the nodes based on your 1,2,....

http://en.wikipedia.org/wiki/Topological_sorting

You can then topologically order your graph (or use a search based method like BFS). In your example, C<-A->B->D and E->F so, A & E have depth of 0 and need to be run first. Then you can run F,B and C in parallel followed by D.

Also, take a look at PERT.

Update:

How do you know whether B has a higher priority than F?

This is the intuition behind the topological sort used to find the ordering.

It first finds the root (no incoming edges) nodes (since one must exist in a DAG). In your case, that's A & E. This settles the first round of jobs which needs to be completed. Next, the children of the root nodes (B,C and F) need to be finished. This is easily obtained by querying your graph. The process is then repeated till there are no nodes (jobs) to be found (finished).

like image 58
Jacob Avatar answered Oct 12 '22 05:10

Jacob


Given a mapping between items, and items they depend on, a topological sort orders items so that no item precedes an item it depends upon.

This Rosetta code task has a solution in Python which can tell you which items are available to be processed in parallel.

Given your input the code becomes:

try:
    from functools import reduce
except:
    pass

data = { # From: http://stackoverflow.com/questions/18314250/optimized-algorithm-to-schedule-tasks-with-dependency
    # This   <-   This  (Reverse of how shown in question)
    'B':         set(['A']),
    'C':         set(['A']),
    'D':         set(['B']),
    'F':         set(['E']),
    }

def toposort2(data):
    for k, v in data.items():
        v.discard(k) # Ignore self dependencies
    extra_items_in_deps = reduce(set.union, data.values()) - set(data.keys())
    data.update({item:set() for item in extra_items_in_deps})
    while True:
        ordered = set(item for item,dep in data.items() if not dep)
        if not ordered:
            break
        yield ' '.join(sorted(ordered))
        data = {item: (dep - ordered) for item,dep in data.items()
                if item not in ordered}
    assert not data, "A cyclic dependency exists amongst %r" % data

print ('\n'.join( toposort2(data) ))

Which then generates this output:

A E
B C F
D

Items on one line of the output could be processed in any sub-order or, indeed, in parallel; just so long as all items of a higher line are processed before items of following lines to preserve the dependencies.

like image 33
Paddy3118 Avatar answered Oct 12 '22 07:10

Paddy3118