Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MPI distribution layer

I used MPI to write a distribution layer. Let say we have n of data sources and k of data consumers. In my approach each of n MPI processes reads data, then distributes it to one (or many) of k data consumers (other MPI processes) in given manner (logic).

So it seems to be very generic and my question is there something like that already done?

It seems simple, but it might be very complicated. Let say that distribution checks which of data consumers is ready to work (dynamic work distribution). It may distribute data according to given algorithm based on data. There are plenty of possibilities and I as every of us do not want to reinvent the wheel.

like image 433
kokosing Avatar asked Nov 14 '22 16:11

kokosing


1 Answers

As far as I know, there is no generic implementation for it, other than the MPI API itself. You should use the correct functions according to the problem's constraints.

If what you're trying to build a simple n-producers-and-k-consumers synchronized job/data queue, then of course there are already many implementations out there (just google it and you should get a few).

However, the way you present it seems very general - sometimes you want the data to only be sent to one consumer, sometimes to all of them, etc. In that case, you should figure out what you want and when, and use either point-to-point communication functions, or collective communication functions, accordingly (and of course everyone has to know what to expect - you can't have a consumer waiting for data from a single source, while the producer wishes to broadcast the data...).

All that aside, here is one implementation that comes to mind that seems to answer all of your requirements:
Make a synchronized queue, producers pushing data in one end, consumers taking it from the other (decide on all kinds of behaviors for the queue as you need - is the queue size limited, does adding an element to a full queue block or fail, does removing an element from an empty queue block or fail, etc.).
Assuming the data contains some flag that tells the consumers if this data is for everyone or just for one of them, the consumers peek and either remove the element, or leave it there and just note that they already did it (either by keeping its id locally, or by changing a flag in the data itself).
If you don't want a single piece of collective data to block until everyone dealt with it, you can use 2 queues, one for each type of data, and the consumers would take data from one of the queues at a time (either by choosing a different queue each time, randomly choosing a queue, prioritizing one of the queues, or by some accepted order that is deductible from the data (e.g. lowest id first)).

Sorry for the long answer, and I hope this helps :)

like image 59
Eran Zimmerman Gonen Avatar answered Dec 30 '22 12:12

Eran Zimmerman Gonen