Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to apply parallelism-programming in graph problems?

Problem Description:

there is n tasks, and in these tasks, one might be dependent on the others, which means if A is dependent on B, then B must be finished before A gets finished.

1.find a way to finish these tasks as quickly as possible?

2.if take parallelism into account, how to design the program to finish these tasks?

Question:

Apparently, the answer to the first question is, topological-sort these tasks, then finish them in that order.

But how to do the job if parallelism taken into consideration?

My answer was,first topological-sort these tasks, then pick those tasks which are independent and finish them first, then pick and finish those independent ones in the rest...

Am I right?

like image 297
Alcott Avatar asked Sep 27 '11 05:09

Alcott


2 Answers

Topological sort algorithms may give you various different result orders, so you cannot just take the first few elements and assume them to be the independent ones.

Instead of topological sorting I'd suggest to sort your tasks by the number of incoming dependency edges. So, for example if your graph has A --> B, A --> C, B --> C, D-->C you would sort it as A[0], D[0], B[1], C[3] where [i] is the number of incoming edges.

With topological sorting, you could also have gotting A,B,D,C. In that case, it wouldn't be easy to find out that you can execute A and D in parallel.

Note that after a task was completely processed you then have to update the remaining tasks, in particular, the ones that were dependent on the finished task. However, if the number of dependencies going into a task is limited to a relatively small number (say a few hundreds), you can easily rely on something like radix/bucket-sort and keep the sort structure updated in constant time.

With this approach, you can also easily start new tasks, once a single parallel task has finished. Simply update the dependency counts, and start all tasks that now have 0 incoming dependencies.

Note that this approach assumes you have enough processing power to process all tasks that have no dependencies at the same time. If you have limited resources and care for an optimal solution in terms of processing time, then you'd have to invest more effort, as the problem becomes NP-hard (as arne already mentioned).

So to answer your original question: Yes, you are basically right, however, you lacked to explain how to determine those independent tasks efficiently (see my example above).

like image 50
Frank Avatar answered Oct 21 '22 14:10

Frank


I would try sorting them in a directed forest structure with task execution time as edge weigths. Order the arborescences from heaviest to lightest and start with the heaviest. Using this approach you can, at the same time, check for circular dependencies.

Using parallelism, you get the bin problem, which is NP-hard. Try looking up approximative solutions for that problem.

like image 1
arne Avatar answered Oct 21 '22 12:10

arne