Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it the correct implementation?

I am having a Windows Service that needs to pick the jobs from database and needs to process it.

Here, each job is a scanning process that would take approx 10 mins to complete.

I am very new to Task Parallel Library. I have implemented in the following way as sample logic:

Queue queue = new Queue();

for (int i = 0; i < 10000; i++)
{
    queue.Enqueue(i);
}

for (int i = 0; i < 100; i++)
{
    Task.Factory.StartNew((Object data ) =>
    {
        var Objdata = (Queue)data;
        Console.WriteLine(Objdata.Dequeue());
        Console.WriteLine(
            "The current thread is " + Thread.CurrentThread.ManagedThreadId);
    }, queue, TaskCreationOptions.LongRunning);
}

Console.ReadLine();

But, this is creating lot of threads. Since loop is repeating 100 times, it is creating 100 threads.

Is it right approach to create that many number of parallel threads ?

Is there any way to limit the number of threads to 10 (concurrency level)?

like image 739
Sai Avinash Avatar asked Jun 24 '14 07:06

Sai Avinash


2 Answers

An important factor to remember when allocating new Threads is that the OS has to allocate a number of logical entities in order for that current thread to run:

  1. Thread kernel object - an object for describing the thread, including the thread's context, cpu registers, etc
  2. Thread environment block - For exception handling and thread local storage
  3. User-mode stack - 1MB of stack
  4. Kernel-mode stack - For passing arguments from user mode to kernel mode

Other than that, the number of concurrent Threads that may run depend on the number of cores your machine is packing, and creating an amount of threads that is larger than the number of cores your machine owns will start causing Context Switching, which in the long run may slow your work down.

So after the long intro, to the good stuff. What we actually want to do is limit the number of threads running and reuse them as much as possible.

For this kind of job, i would go with TPL Dataflow which is based on the Producer-Consumer pattern. Just a small example of what can be done:

// a BufferBlock is an equivalent of a ConcurrentQueue to buffer your objects
var bufferBlock = new BufferBlock<object>();

// An ActionBlock to process each object and do something with it
var actionBlock = new ActionBlock<object>(obj =>
{
     // Do stuff with the objects from the bufferblock
});

bufferBlock.LinkTo(actionBlock);
bufferBlock.Completion.ContinueWith(t => actionBlock.Complete());

You may pass each Block a ExecutionDataflowBlockOptions which may limit the Bounded Capacity (The number of objects inside the BufferBlock) and MaxDegreeOfParallelism which tells the block the number of maximum concurrency you may want.

There is a good example here to get you started.

like image 158
Yuval Itzchakov Avatar answered Sep 29 '22 03:09

Yuval Itzchakov


Glad you asked, because you're right in the sense that - this is not the best approach.

The concept of Task should not be confused with a Thread. A Thread can be compared to a chef in a kitchen, while a Task is a dish ordered by a customer. You have a bunch of chefs, and they process the dish orders in some ordering (usually FIFO). A chef finishes a dish then moves on to the next. The concept of Thread Pool is the same. You create a bunch of Tasks to be completed, but you do not need to assign a new thread to each task.

Ok so the actual bits to do it. There are a few. The first one is ThreadPoll.QueueUserWorkItem. (http://msdn.microsoft.com/en-us/library/system.threading.threadpool.queueuserworkitem(v=vs.110).aspx). Using the Parallel library, Parallel.For can also be used, it will automatically spawn threads based on the number of actual CPU cores available in the system.

Parallel.For(0, 100, i=>{
    //here, this method will be called 100 times, and i will be 0 to 100
    WaitForGrassToGrow();
    Console.WriteLine(string.Format("The {0}-th task has completed!",i));
});

Note that there is no guarantee that the method called by Parallel.For is called in sequence (0,1,2,3,4,5...). The actual sequence depends on the execution.

like image 45
kevin Avatar answered Sep 29 '22 02:09

kevin