Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C# Multi-threaded app - structure?

So, I'll make an application for checking links if they're accessible(live). My question is how to make the threads "always busy". What I mean: The app run 100 threads(created with FOR loop for example) with 100 different URLs. So when 1 of the threads finish it's job(check if URL is available) to get new URL and start again immediately. So the 100 threads will work non-stop till all URLs are checked.

How can I accomplish that?

like image 589
user1410644 Avatar asked Jul 20 '12 15:07

user1410644


2 Answers

What you are looking for is called the Producer-Consumer Model. You have a pool of resources, that contains the list of urls to check, one thread can fill that pool, and your conumer threads can pull from that pool, if you have .NET 4 Parallel.ForEach does most of the work for you.

Using 100 threads also is very likely not going to be the optimum number of threads, just let the Task Parallel Library manage the thread count for you.

Here is a example if the list will be pre-populated and not have more items added as the thread is running.

//Parallel.Foreach will block until it is done so you may want to run this function on a background worker.
public void StartThreads()
{
    List<string> myListOfUrls = GetUrls();

    Parallel.Foreach(myListOfUrls, ProcessUrl);
}


private void ProcessUrl(string url)
{
    //Do your work here, this code will be run from multiple threads.
}

If you need to populate the collection as it runs, replace List<string> with a concurrent collection like BlockingCollection

BlockingCollection<string> myListOfUrls = new BlockingCollection();

//Parallel.Foreach will block until it is done so you may want to run this function on a background worker.
public void StartThreads()
{
    if(myListOfUrls.IsComplete == true)
    {
        //The collection has emptied itself and you told it you where done using it, you will either need to throw a exception or make a new collection.
        //use IsCompleatedAdding to check to see if you told it that you are done with it, but there still may be members left to process.
        throw new InvalidOperationException();
    }

    //We create a Partitioner to remove the buffering behavior of Parallel.ForEach, this gives better performance with a BlockingCollection.
    var partitioner = Partitioner.Create(myListOfUrls.GetConsumingEnumerable(), EnumerablePartitionerOptions.NoBuffering);
    Parallel.ForEach(partitioner, ProcessUrl);
}

public void StopThreads()
{
    myListOfUrls.CompletedAdding()
}

public void AddUrl(string url)
{
    myListOfUrls.Add(url);
}

private void ProcessUrl(string url)
{
    //Do your work here, this code will be run from multiple threads.
}

I also wanted to add that the automated thread scheduling may not be the best also, it may put some limits that could be expanded on, see this comment from the original question

For those, who said/upvoted 100 thread is a terrible idea: On my dual core 2GB RAM XP machine Parallel.Foreach never created more than 5 threads(unless I set ThreadPool.SetMinThreads) and creating 100 threads resulted always ~30-40% faster operation. So don't leave everything to Parallel.Foreach . PS: My test code WebClient wc = new WebClient();var s = wc.DownloadString(url); (google's home page) – L.B

like image 187
Scott Chamberlain Avatar answered Oct 10 '22 08:10

Scott Chamberlain


Use the Parallel CTP stuff, the parallel foreach method included will do exactly what you want.

Google is your friend.

Also, using 100 threads may not be best for performance, I would use however many cores are available.

like image 23
Marlon Avatar answered Oct 10 '22 07:10

Marlon