Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Thread.Sleep blocking parallel execution of tasks

I'm calling a worker method that calls to the database that then iterates and yield returns values for parallel processing. To prevent it from hammering the database, I have a Thread.Sleep in there to pause the execution to the DB. However, this appears to be blocking executions that are still occurring in the Parallel.ForEach. What is the best way to achieve this to prevent blocking?

private void ProcessWorkItems()
{
    _cancellation = new CancellationTokenSource();
    _cancellation.Token.Register(() => WorkItemRepository.ResetAbandonedWorkItems());

    Task.Factory.StartNew(() =>
        Parallel.ForEach(GetWorkItems().AsParallel().WithDegreeOfParallelism(10), workItem =>
        {
            var x = ItemFactory(workItem);
            x.doWork();
        }), _cancellation.Token);
}

private IEnumerable<IAnalysisServiceWorkItem> GetWorkItems()
{
    while (!_cancellation.IsCancellationRequested)
    {
        var workItems = WorkItemRepository.GetItemList(); //database call

        workItems.ForEach(item =>
        {
            item.QueueWorkItem(WorkItemRepository);
        });

        foreach (var item in workItems)
        {
            yield return item;
        }

        if (workItems.Count == 0)
        {
            Thread.Sleep(30000); //sleep this thread for 30 seconds if no work items.
        }
    }

    yield break;
}

Edit: I changed it to include the answer and it's still not working as I'm expecting. I added the .AsParallel().WithDegreeOfParallelism(10) to the GetWorkItems() call. Are my expectations incorrect when I think that Parallel should continue to execute even though the base thread is sleeping?

Example: I have 15 items, it iterates and grabs 10 items and starts them. As each one finishes, it asks for another one from GetWorkItems until it tries to ask for a 16th item. At that point it should stop trying to grab more items but should continue processing items 11-15 until those are complete. Is that how parallel should be working? Because it's not currently doing that. What it's currently doing is when it completes 6, it locks the subsequent 10 still being run in the Parallel.ForEach.

like image 918
Conway Stern Avatar asked Dec 12 '22 09:12

Conway Stern


2 Answers

I would suggest that you create a BlockingCollection (a queue) of work items, and a timer that calls the database every 30 seconds to populate it. Something like:

BlockingCollection<WorkItem> WorkItems = new BlockingCollection<WorkItem>();

And on initialization:

System.Threading.Timer WorkItemTimer = new Timer((s) =>
    {
        var items = WorkItemRepository.GetItemList(); //database call
        foreach (var item in items)
        {
            WorkItems.Add(item);
        }
    }, null, 30000, 30000);

That will query the database for items every 30 seconds.

For scheduling the work items to be processed, you have a number of different solutions. The closest to what you have would be this:

WorkItem item;

while (WorkItems.TryTake(out item, Timeout.Infinite, _cancellation))
{
    Task.Factory.StartNew((s) =>
        {
            var myItem = (WorkItem)s;
            // process here
        }, item);
}

That eliminates blocking in any of the threads, and lets the TPL decide how best to allocate the parallel tasks.

EDIT:

Actually, closer to what you have is:

foreach (var item in WorkItems.GetConsumingEnumerable(_cancellation))
{
    // start task to process item
}

You might be able to use:

Parallel.Foreach(WorkItems.GetConsumingEnumerable(_cancellation).AsParallel ...

I don't know if that will work or how well. Might be worth a try . . .

END OF EDIT

In general, what I'm suggesting is that you treat this as a producer/consumer application, with the producer being the thread that queries the database periodically for new items. My example queries the database once every N (30 in this case) seconds, which will work well if, on average, you can empty your work queue every 30 seconds. That will give an average latency of less than a minute from the time an item is posted to the database until you have the results.

You can reduce the polling frequency (and thus the latency), but that will cause more database traffic.

You can get fancier with it, too. For example, if you poll the database after 30 seconds and you get a huge number of items, then it's likely that you'll be getting more soon, and you'll want to poll again in 15 seconds (or less). Conversely, if you poll the database after 30 seconds and get nothing, then you can probably wait longer before you poll again.

You can set up that kind of adaptive polling using a one-shot timer. That is, you specify -1 for the last parameter when you create the timer, which causes it to fire only once. Your timer callback figures out how long to wait before the next poll and calls Timer.Change to initialize the timer with the new value.

like image 104
Jim Mischel Avatar answered Dec 14 '22 23:12

Jim Mischel


You can use the .WithDegreeOfParallelism() extension method to force PLinq to run the tasks simultaneously. There's a good example in the Call Blocking or I/O Intensive section in th C# Threading Handbook

like image 39
scottm Avatar answered Dec 14 '22 22:12

scottm