Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Design Pattern Alternative to Coroutines

Currently, I have a large number of C# computations (method calls) residing in a queue that will be run sequentially. Each computation will use some high-latency service (network, disk...).

I was going to use Mono coroutines to allow the next computation in the computation queue to continue while a previous computation is waiting for the high latency service to return. However, I prefer to not depend on Mono coroutines.

Is there a design pattern that's implementable in pure C# that will enable me to process additional computations while waiting for high latency services to return?

Thanks

Update:

I need to execute a huge number (>10000) of tasks, and each task will be using some high-latency service. On Windows, you can't create that much threads.

Update:

Basically, I need a design pattern that emulates the advantages (as follows) of tasklets in Stackless Python (http://www.stackless.com/)

  1. Huge # of tasks
  2. If a task blocks the next task in the queue executes
  3. No wasted cpu cycle
  4. Minimal overhead switching between tasks
like image 420
jameszhao00 Avatar asked Aug 23 '09 19:08

jameszhao00


3 Answers

You can simulate cooperative microthreading using IEnumerable. Unfortunately this won't work with blocking APIs, so you need to find APIs that you can poll, or which have callbacks that you can use for signalling.

Consider a method

IEnumerable Thread ()
{
    //do some stuff
    Foo ();

    //co-operatively yield
    yield null;

    //do some more stuff
    Bar ();

    //sleep 2 seconds
    yield new TimeSpan (2000);
}

The C# compiler will unwrap this into a state machine - but the appearance is that of a co-operative microthread.

The pattern is quite straightforward. You implement a "scheduler" that keeps a list of all the active IEnumerators. As it cycles through the list, it "runs" each one using MoveNext (). If the value of MoveNext is false, the thread has ended, and the scheduler removes it from the list. If it's true, then the scheduler accesses the Current property to determine the current state of the thread. If it's a TimeSpan, the thread wishes to sleep, and the scheduler moved it onto some queue that can be flushed back into the main list when the sleep timespans have ended.

You can use other return objects to implement other signalling mechanisms. For example, define some kind of WaitHandle. If the thread yields one of these, it can be moved to a waiting queue until the handle is signalled. Or you could support WaitAll by yielding an array of wait handles. You could even implement priorities.

I did a simple implementation of this scheduler in about 150LOC but I haven't got round to blogging the code yet. It was for our PhyreSharp PhyreEngine wrapper (which won't be public), where it seems to work pretty well for controlling a couple of hundred characters in one of our demos. We borrowed the concept from the Unity3D engine -- they have some online docs that explain it from a user point of view.

like image 196
Mikayla Hutchinson Avatar answered Nov 07 '22 06:11

Mikayla Hutchinson


.NET 4.0 comes with extensive support for Task parallelism:

  • How to: Use Parallel.Invoke to Execute Simple Parallel Tasks
  • How to: Return a Value from a Task
  • How to: Chain Multiple Tasks with Continuations
like image 35
dtb Avatar answered Nov 07 '22 05:11

dtb


I'd recommend using the Thread Pool to execute multiple tasks from your queue at once in manageable batches using a list of active tasks that feeds off of the task queue.

In this scenario your main worker thread would initially pop N tasks from the queue into the active tasks list to be dispatched to the thread pool (most likely using QueueUserWorkItem), where N represents a manageable amount that won't overload the thread pool, bog your app down with thread scheduling and synchronization costs, or suck up available memory due to the combined I/O memory overhead of each task.

Whenever a task signals completion to the worker thread, you can remove it from the active tasks list and add the next one from your task queue to be executed.

This will allow you to have a rolling set of N tasks from your queue. You can manipulate N to affect the performance characteristics and find what is best in your particular circumstances.

Since you are ultimately bottlenecked by hardware operations (disk I/O and network I/O, CPU) I imagine smaller is better. Two thread pool tasks working on disk I/O most likely won't execute faster than one.

You could also implement flexibility in the size and contents of the active task list by restricting it to a set number of particular type of task. For example if you are running on a machine with 4 cores, you might find that the highest performing configuration is four CPU-bound tasks running concurrently along with one disk-bound task and a network task.

If you already have one task classified as a disk IO task, you may choose to wait until it is complete before adding another disk IO task, and you may choose to schedule a CPU-bound or network-bound task in the meanwhile.

Hope this makes sense!

PS: Do you have any dependancies on the order of tasks?

like image 5
jscharf Avatar answered Nov 07 '22 04:11

jscharf