Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Time optimization of parallel long-running tasks

Tags:

Intro

I am working with a complex external library, where I am attempting to execute it's functionality on a large list of items. The library does not expose a good async interface, so I'm stuck with some quite old-fashioned code.

My aim is to optimize the time it takes to complete a batch of processing, and to demonstrate the problem without having to include the actual 3rd party library I have created an approximation of the problem below

Problem

Given a non-async action, where you can know the "size" (ie, complexity) of the action in advance:

public interface IAction
{
    int Size { get; }
    void Execute();
}

And given there are 3 variants of this action:

public class LongAction : IAction
{
    public int Size => 10000;
    public void Execute()
    {
        Thread.Sleep(10000);
    }
}

public class MediumAction : IAction
{

    public int Size => 1000;
    public void Execute()
    {
        Thread.Sleep(1000);
    }
}

public class ShortAction : IAction
{
    public int Size => 100;
    public void Execute()
    {
        Thread.Sleep(100);
    }
}

How could you optimize a long list of these actions, so that when run in some sort of parallel fashion, the entire batch completes as fast as possible?

Naively, you could just throw the entire lot at a Parallel.ForEach, with a reasonably high Parallelism and that certainly works - but there must be a way to optimally schedule them so some of the biggest are started first.

To further illustrate the problem, if we take a super-simplified example

  • 1 task of size 10
  • 5 tasks of size 2
  • 10 tasks of size 1

And 2 available threads. I could come up with 2 (of many) ways to schedule these tasks (The black bar being dead time - nothing to schedule):

enter image description here

Clearly the first one completes earlier than the second.

Minimal Complete & Verifiable Code

Entire test code if anyone fancies a bash (try to make it faster than my naive implementation below):

class Program
{
    static void Main(string[] args)
    {
        MainAsync().GetAwaiter().GetResult();
        Console.ReadLine();
    }

    static async Task MainAsync()
    {
        var list = new List<IAction>();
        for (var i = 0; i < 200; i++) list.Add(new LongAction());
        for (var i = 0; i < 200; i++) list.Add(new MediumAction());
        for (var i = 0; i < 200; i++) list.Add(new ShortAction());


        var swSync = Stopwatch.StartNew();
        Parallel.ForEach(list, new ParallelOptions { MaxDegreeOfParallelism = 20 }, action =>
        {
            Console.WriteLine($"{DateTime.Now:HH:mm:ss}: Starting action {action.GetType().Name} on thread {Thread.CurrentThread.ManagedThreadId}");
            var sw = Stopwatch.StartNew();
            action.Execute();
            sw.Stop();
            Console.WriteLine($"{DateTime.Now:HH:mm:ss}: Finished action {action.GetType().Name} in {sw.ElapsedMilliseconds}ms on thread {Thread.CurrentThread.ManagedThreadId}");
        });
        swSync.Stop();
        Console.WriteLine($"Done in {swSync.ElapsedMilliseconds}ms");
    }
}


public interface IAction
{
    int Size { get; }
    void Execute();
}

public class LongAction : IAction
{
    public int Size => 10000;
    public void Execute()
    {
        Thread.Sleep(10000);
    }
}

public class MediumAction : IAction
{

    public int Size => 1000;
    public void Execute()
    {
        Thread.Sleep(1000);
    }
}

public class ShortAction : IAction
{
    public int Size => 100;
    public void Execute()
    {
        Thread.Sleep(100);
    }
}
like image 949
Jamiec Avatar asked Jan 08 '19 12:01

Jamiec


2 Answers

A relatively quick and dirty solution is to use a load-balancing partitioner on top of an action list sorted by decreasing size

var sorted = list.OrderByDescending(a => a.Size).ToArray();
var partitioner=Partitioner.Create(sorted, loadBalance:true);

Parallel.ForEach(partitioner, options, action =>...);

Using just these two lines performans improves by about ~30%, just like the other answers.

PLINQ partitions data and uses a separate tasks to process an entire partition at a time. When the input size is known, as is the case with IList-derived arrays and lists, the input is partitioned into equally sized chunks and fed to each worker task.

When the size isn't known, as is the case with iterator methods, LINQ queries etc, PLINQ uses chunk partitioning. A chunk of data is retrieved at a time and fed to the worker tasks.

Another option, which I'd forgotten, is load balancing on top chunck partitioning. This applies chunk partitioning using small chunks to arrays and IList-derived input. The load-balancing Partitioner.Create overloads return OrderablePartitioner instances, so the order of the IAction items is preserved

The same can be achieved with an IEnumerable<T> source by specifying the EnumerablePartitionerOptions.NoBuffering option :

var sorted = list.OrderByDescending(a => a.Size);
var partitioner=Partitioner.Create(sorted,EnumerablePartitionerOptions.NoBuffering);

This will create an OrderablePartitioner that uses chunk encoding

like image 75
Panagiotis Kanavos Avatar answered Oct 21 '22 17:10

Panagiotis Kanavos


Just off the bat I'd say that the problem is as follows.

You have a list of integers and a limited number of summers. You want an algorithm that sums the integers into the summers so that the maximum value of summers is the minimum possible.

E.g.:

list = 1, 4, 10, 2, 3, 4
summers = 3

summer(1): 10
summer(2): 4 + 4
summer(3): 3 + 2 + 1

As you can see the bounding factor is the longest running task. The shorter ones are easily served in parallel or in less time. It's similar to Knapsack, but in the end boils down to a very simple "longest first" ordering of the tasks.

The pseudo code (with my invented classes) would be:

while (taskManager.HasTasks())
{
    task = taskManager.GetLongestTask();
    thread = threadManager.GetFreeThread(); // blocks if no thread available
    thread.Run(task);
}

This is just pseudo code, not parallel/async and blocks. I hope you can derive from it something useful.

like image 27
pid Avatar answered Oct 21 '22 17:10

pid