Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parallel.For loop - Assigning a unique data entity for each thread

I have a 100 records for Parallelization, from 1 to 100, Now I can conveniently use a Parallel.For to execute them in Parallel as follows, which will work based on computing resources

 Parallel.For(0, limit, i =>
    {
        DoWork(i);
    });

but there are certain restrictions, each thread need to work with an identical Data entity and there are limited number of Data entities say 10, which are created in advanced by cloning each other and saving them in a structure like Dictionary or List. Now I can restrict the amount of parallelization using the following code:

 Parallel.For(0, limit, new ParallelOptions { MaxDegreeOfParallelism = 10 }, i =>
    {
        DoWork(i);
    });

But the issue is how to assign a unique data entity for each incoming thread, such that Data entity is not used by any other current thread in execution, since the number of threads and data entity are same, so starvation is not an issue. I can think of way, in which I create a boolean value for each data entity, specifying whether it's in use or not, thus we iterate through the dictionary or list to find the next available data entity and lock the overall assignment process, so that one thread is assigned a data entity at a given time, but in my view this issue will have much more elegant solution, my version is just a workaround, not really a fix. My logic is:

Parallel.For(0, limit, new ParallelOptions { MaxDegreeOfParallelism = 10 }, i =>
        {
            lock(All_Threads_Common_Object)
            {
              Check for available data entity using boolean
              Assign the Data entity
            }
            DoWork(i);

            Reset the Boolean value for another thread to use it
        });

Please let me know if the question needs further clarification

like image 218
Mrinal Kamboj Avatar asked Dec 14 '22 18:12

Mrinal Kamboj


2 Answers

Use the overload of Parallel.For which accepts a thread local initialization function.

Parallel.For<DataEntity>(0, limit, 
    //will run once for each thread
    () => GetThreadLocalDataEntity(),

    //main loop body, will run once per iteration
    (i, loop, threadDataEntity) =>
    {
        DoWork(i, threadDataEntity);
        return threadDataEntity; //we must return it here to adhere to the Func signature.
    },

    //will run once for each thread after the loop
    (threadDataEntity) => threadDataEntity.Dispose() //if necessary
);

The main advantage of this method vs. the one you posted in the question, is that assignment of DataEntity happens once per thread, not once per loop iteration.

like image 108
Rotem Avatar answered May 10 '23 03:05

Rotem


You can use a concurrent collection to store your 10 objects. Each Worker will pull one data entity out, use it, and give it back. Te use of the concurrent collection is important, because in your scenario the normal one is not thread safe.

Like so:

var queue = new ConcurrentQueue<DataEntity>();
// fill the queue with 10 items

Parallel.For(0, limit, new ParallelOptions { MaxDegreeOfParallelism = 10 }, i =>
    {
        DataEntity x;
        if(!queue.TryDequeue(out x))
            throw new InvalidOperationException();
        DoWork(i, x);
        queue.Enqueue(x);
    });

Or, if blocking needs to be provided, wrap the thing in a BlockingCollection.

Edit: Do not wrap it in a loop to keep waiting. Rather, use the BlockingCollection like this:

var entities = new BlockingCollection(new ConcurrentQueue<DataEntity>());

// fill the collection with 10 items

Parallel.For(0, limit, new ParallelOptions { MaxDegreeOfParallelism = 10 }, i =>
    {
        DataEntity x = entities.Take();
        DoWork(i, x);
        entities.Add(x);
    });
like image 40
DasKrümelmonster Avatar answered May 10 '23 04:05

DasKrümelmonster