Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parallel Linq - Use more threads than processors (for non-CPU bound tasks)

I'm using parallel linq, and I'm trying to download many urls concurrently using essentily code like this:

int threads = 10;
Dictionary<string, string> results = urls.AsParallel( threads ).ToDictionary( url => url, url => GetPage( url );

Since downloading web pages is Network bound rather than CPU bound, using more threads than my number of processors/cores is very benificial, since most of the time in each thread is spent waiting for the network to catch up. However, judging form the fact that running the above with threads = 2 has the same performance as threads = 10 on my dual core machine, I'm thinking that the treads sent to AsParallel is limited to the number of cores.

Is there any way to override this behavior? Is there a similar library available that doesn't have this limitation?

(I've found such a library for python, but need something that works in .Net)

like image 293
Tristan Havelick Avatar asked Mar 04 '09 20:03

Tristan Havelick


3 Answers

By default, .Net has limit of 2 concurrent connections to an end service point (IP:port). Thats why you would not see a difference if all urls are to one and the same server.

It can be controlled using ServicePointManager.DefaultPersistentConnectionLimit property.

like image 103
Sunny Milenov Avatar answered Nov 12 '22 15:11

Sunny Milenov


Do the URLs refer to the same server? If so, it could be that you are hitting the HTTP connection limit instead of the threading limit. There's an easy way to tell - change your code to:

int threads = 10;
Dictionary<string, string> results = urls.AsParallel(threads)
    .ToDictionary(url => url, 
                  url => {
                      Console.WriteLine("On thread {0}",
                                        Thread.CurrentThread.ManagedThreadId);
                      return GetPage(url);
                  });

EDIT: Hmm. I can't get ToDictionary() to parallelise at all with a bit of sample code. It works fine for Select(url => GetPage(url)) but not ToDictionary. Will search around a bit.

EDIT: Okay, I still can't get ToDictionary to parallelise, but you can work around that. Here's a short but complete program:

using System;
using System.Collections.Generic;
using System.Threading;
using System.Linq;
using System.Linq.Parallel;

public class Test
{

    static void Main()
    {
        var urls = Enumerable.Range(0, 100).Select(i => i.ToString());

        int threads = 10;
        Dictionary<string, string> results = urls.AsParallel(threads)
            .Select(url => new { Url=url, Page=GetPage(url) })
            .ToDictionary(x => x.Url, x => x.Page);
    }

    static string GetPage(string x)
    {
        Console.WriteLine("On thread {0} getting {1}",
                          Thread.CurrentThread.ManagedThreadId, x);
        Thread.Sleep(2000);
        return x;
    }
}

So, how many threads does this use? 5. Why? Goodness knows. I've got 2 processors, so that's not it - and we've specified 10 threads, so that's not it. It still uses 5 even if I change GetPage to hammer the CPU.

If you only need to use this for one particular task - and you don't mind slightly smelly code - you might be best off implementing it yourself, to be honest.

like image 32
Jon Skeet Avatar answered Nov 12 '22 17:11

Jon Skeet


I think there are already good answers to the question, but I'd like to make one important point. Using PLINQ for tasks that are not CPU bound is in principle wrong design. Not to say that it won't work - it will, but using multiple threads when it is unnecessary can cause troubles.

Unfortunatelly, there is no good way to solve this problem in C#. In F# you could use asynchornous workflows that run in parallel, but don't block the thread when performing asynchronous calls (under the cover, it uses BeginOperation and EndOperation methods). You can find more information here:

  • Concurrency in F# – Part I – The Asynchronous Workflow

The same idea can to some extent be used in C#, but it looks a bit weird (but it is more efficient). I wrote an article about that and there is also a library that should be slightly more evolved than my original idea:

  • Asynchronous Programming in C# using Iterators
  • EasyAsync library
like image 37
Tomas Petricek Avatar answered Nov 12 '22 16:11

Tomas Petricek