There are tasks that read from a file, do some processing and write to a file. These tasks are to be scheduled based on the dependency. Also tasks can be run in parallel, so the algorithm needs to be optimized to run dependent tasks in serial and as much as possible in parallel. eg: <ol> <li>A -> B</li> <li>A -> C</li> <li>B -> D</li> <li>E -> F</li> </ol> So one way to run this would be run 1, 2 & 4 in parallel. Followed by 3. Another way could be run 1 and then run 2, 3 & 4 in parallel. Another could be run 1 and 3 in serial, 2 and 4 in parallel. Any ideas?

Let each task (e.g. <code>A,B,...</code>) be nodes in a directed acyclic graph and define the arcs between the nodes based on your <code>1,2,...</code>. <img src="https://i.stack.imgur.com/R2u1v.png" alt="http://en.wikipedia.org/wiki/Topological_sorting"> You can then topologically order your graph (or use a search based method like BFS). In your example, <code>C<-A->B->D</code> and <code>E->F</code> so, <code>A</code> & <code>E</code> have depth of 0 and need to be run first. Then you can run <code>F</code>,<code>B</code> and <code>C</code> in parallel followed by <code>D</code>. Also, take a look at PERT. <h3>Update:</h3> How do you know whether <code>B</code> has a higher priority than <code>F</code>? This is the intuition behind the topological sort used to find the ordering. It first finds the root (no incoming edges) nodes (since one must exist in a DAG). In your case, that's <code>A</code> & <code>E</code>. This settles the first round of jobs which needs to be completed. Next, the children of the root nodes (<code>B</code>,<code>C</code> and <code>F</code>) need to be finished. This is easily obtained by querying your graph. The process is then repeated till there are no nodes (jobs) to be found (finished).

Optimized algorithm to schedule tasks with dependency?

Tags:

algorithm

scheduled-tasks

scheduling

There are tasks that read from a file, do some processing and write to a file. These tasks are to be scheduled based on the dependency. Also tasks can be run in parallel, so the algorithm needs to be optimized to run dependent tasks in serial and as much as possible in parallel.

eg:

A -> B
A -> C
B -> D
E -> F

So one way to run this would be run 1, 2 & 4 in parallel. Followed by 3.

Another way could be run 1 and then run 2, 3 & 4 in parallel.

Another could be run 1 and 3 in serial, 2 and 4 in parallel.

Any ideas?

237

asked Aug 19 '13 12:08

user2186138

2 Answers

Let each task (e.g. A,B,...) be nodes in a directed acyclic graph and define the arcs between the nodes based on your 1,2,....

You can then topologically order your graph (or use a search based method like BFS). In your example, C<-A->B->D and E->F so, A & E have depth of 0 and need to be run first. Then you can run F,B and C in parallel followed by D.

Also, take a look at PERT.

Update:

How do you know whether B has a higher priority than F?

This is the intuition behind the topological sort used to find the ordering.

It first finds the root (no incoming edges) nodes (since one must exist in a DAG). In your case, that's A & E. This settles the first round of jobs which needs to be completed. Next, the children of the root nodes (B,C and F) need to be finished. This is easily obtained by querying your graph. The process is then repeated till there are no nodes (jobs) to be found (finished).

answered Oct 12 '22 05:10

Jacob

Given a mapping between items, and items they depend on, a topological sort orders items so that no item precedes an item it depends upon.

This Rosetta code task has a solution in Python which can tell you which items are available to be processed in parallel.

Given your input the code becomes:

try:
    from functools import reduce
except:
    pass

data = { # From: http://stackoverflow.com/questions/18314250/optimized-algorithm-to-schedule-tasks-with-dependency
    # This   <-   This  (Reverse of how shown in question)
    'B':         set(['A']),
    'C':         set(['A']),
    'D':         set(['B']),
    'F':         set(['E']),
    }

def toposort2(data):
    for k, v in data.items():
        v.discard(k) # Ignore self dependencies
    extra_items_in_deps = reduce(set.union, data.values()) - set(data.keys())
    data.update({item:set() for item in extra_items_in_deps})
    while True:
        ordered = set(item for item,dep in data.items() if not dep)
        if not ordered:
            break
        yield ' '.join(sorted(ordered))
        data = {item: (dep - ordered) for item,dep in data.items()
                if item not in ordered}
    assert not data, "A cyclic dependency exists amongst %r" % data

print ('\n'.join( toposort2(data) ))

Which then generates this output:

A E
B C F
D

Items on one line of the output could be processed in any sub-order or, indeed, in parallel; just so long as all items of a higher line are processed before items of following lines to preserve the dependencies.

answered Oct 12 '22 07:10

Paddy3118

Related questions
                            
                                Correlating word proximity
                            
                                Strong Semaphore Queuing Discipline and Starvation
                            
                                Orientation of a known object in an image
                            
                                CUDA Stream compaction: understanding the concept
                            
                                Is ternary search less efficient than this related algorithm?
                            
                                Simulating the highlight recovery tool from Photoshop
                            
                                javascript slider weighted values
                            
                                Generate multiple sequences of numbers with unique values at each index
                            
                                Predator-prey simulation
                            
                                How to build a non-binary tree with or without recursion?
                            
                                Finding the minimum value of the maximum cluster?
                            
                                What's the win strategy in such a game?
                            
                                is there any paper or an explanation on how to implement a two dimensional KMP?
                            
                                increasing decreasing sequence
                            
                                The maximal sum of a rectangular sub-array
                            
                                I can't solve a problem that related to prime power modulo 1e9+7. I think to solve this problem , we must use Constructive Algorithm
                            
                                AI: Fastest algorithm to find if path exists?
                            
                                Text similarity Algorithms
                            
                                How can I tell if a point belongs to a certain line?
                            
                                Fair division of a kingdom [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With