Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What type of queue to use in parallel data processing - C# - .NET 4

Scenario: Data is received and written to database with timestamps. I need to process the raw data in the order that is received based on the time stamp and write it back to the database, different table, again maintaining the order based on the timestamp.

I came up with the following design: Created two queues, one for storing raw data from database, another for storing processed data before it's written back to DB. I have two threads, one reading to the Initial queue and another reading from Result queue. In between i spawn multiple threads to process data from Initial queue and write it to Result queue.

I have experimented with SortedList (manual locking) and BlockingCollection. I have used two approaches to process in parallel: Parallel.For(ForEach) and TaskFactory.Task.StartNew.

Each unit of data may take variable amount of time to process, based on several factors. One thread can still be processing the first data point while other threads are done with three or four datapoints each, messing up the timestamp order.

I have found out about OrderingPartitioner recently and i thought it would solve the problem, but following MSDNs example i can see, that it's not sorting the underlying collection either. May be i need to implement custom partitioner to order my collection of complex data types? or may be there's a better way of approaching the problem?

Any suggestions and/or links to articles discussing similar problem is highly appreciated.

like image 852
Dimitri Avatar asked Apr 14 '11 17:04

Dimitri


1 Answers

Personally, I would at least try to start with using a BlockingCollection<T> for the input and a ConcurrentQueue<T> instance for the results.

I would use Parallel Linq to process the results. In order to preserve the order during your processing, you could use AsOrdered() on the PLINQ statement.

like image 92
Reed Copsey Avatar answered Oct 28 '22 20:10

Reed Copsey