Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hashed/Sharded ActionBlocks

I have a constant flow of certain items that I need to process in parallel so I'm using TPL Dataflow. The catch is that the items that share the same key (similar to a Dictionary) should be processed in a FIFO order and not be parallel to each other (they can be parallel to other items with different values).

The work being done is very CPU bound with minimal asynchronous locks so my solution was to create an array of ActionBlock<T>s the size of Environment.ProcessorCount with no parallelism and post to them according to the key's GetHashCode value.

Creation:

_actionBlocks = new ActionBlock<Item>[Environment.ProcessorCount];
for (int i = 0; i < _actionBlocks.Length; i++)
{
    _actionBlocks[i] = new ActionBlock<Item>(_ => ProcessItemAsync(_));
}

Usage:

bool ProcessItem(Key key, Item item)
{
    var actionBlock = _actionBlocks[(uint)key.GetHashCode() % _actionBlocks.Length];
    return actionBlock.Post(item);
}

So, my question is, is this the best solution to my problem? Am I hurting performance/scalability? Am I missing something?

like image 683
i3arnon Avatar asked Jan 09 '14 01:01

i3arnon


1 Answers

I think your approach is reasonable, assuming you know the hash codes will be distributed well.

If you want to have a better protection against bad distributions, you could use larger number of ActionBlocks while limiting their total concurrency level by using a single custom TaskScheduler shared by all blocks. You can find such scheduler in ParallelExtensionsExtras or on MSDN.

like image 80
svick Avatar answered Nov 13 '22 03:11

svick