Logo Questions Linux Laravel Mysql Ubuntu Git Menu

Best multi-thread approach for multiple web requests

I want to create a program to crawl and check my websites for http errors and other things. I want to do this with multiple threads that should accept parameters like the url to crawl. Although I want X threads to be active there are Y Tasks waiting already to be executed.

Now I wanted to know what is the best strategy to do this: ThreadPool, Tasks, Threads or even something else?

like image 734
maddo7 Avatar asked Dec 27 '22 06:12


2 Answers

Here's an example that shows how to queue up a bunch of tasks but limit the number that are concurrently running . It uses a Queue to keep track of tasks that are ready to run and uses a Dictionary to keep track of tasks that are running. When a task finishes it invokes a callback method to remove itself from the Dictionary. An async method is used to launch queued tasks as space becomes available.

using System;
using System.Collections.Generic;
using System.Threading;
using System.Threading.Tasks;

namespace MinimalTaskDemo
    class Program
        private static readonly Queue<Task> WaitingTasks = new Queue<Task>();
        private static readonly Dictionary<int, Task> RunningTasks = new Dictionary<int, Task>();
        public static int MaxRunningTasks = 100; // vary this to dynamically throttle launching new tasks 

        static void Main(string[] args)
            var tokenSource = new CancellationTokenSource();
            var token = tokenSource.Token;
            Worker.Done = new Worker.DoneDelegate(WorkerDone);
            for (int i = 0; i < 1000; i++)  // queue some tasks
                // task state (i) will be our key for RunningTasks
                WaitingTasks.Enqueue(new Task(id => new Worker().DoWork((int)id, token), i, token));
            if (RunningTasks.Count > 0)
                lock (WaitingTasks) WaitingTasks.Clear();

        static async void LaunchTasks()
            // keep checking until we're done
            while ((WaitingTasks.Count > 0) || (RunningTasks.Count > 0))
                // launch tasks when there's room
                while ((WaitingTasks.Count > 0) && (RunningTasks.Count < MaxRunningTasks))
                    Task task = WaitingTasks.Dequeue();
                    lock (RunningTasks) RunningTasks.Add((int)task.AsyncState, task);
                await Task.Delay(300); // wait before checking again
            UpdateConsole();    // all done

        static void UpdateConsole()
            Console.Write(string.Format("\rwaiting: {0,3:##0}  running: {1,3:##0} ", WaitingTasks.Count, RunningTasks.Count));

        // callback from finished worker
        static void WorkerDone(int id)
            lock (RunningTasks) RunningTasks.Remove(id);

    internal class Worker
        public delegate void DoneDelegate(int taskId);
        public static DoneDelegate Done { private get; set; }
        private static readonly Random Rnd = new Random();

        public async void DoWork(object id, CancellationToken token)
            for (int i = 0; i < Rnd.Next(20); i++)
                if (token.IsCancellationRequested) break;
                await Task.Delay(100);  // simulate work
like image 129
Ed Power Avatar answered Jan 08 '23 02:01

Ed Power

I recommend using (asynchronous) Tasks for downloading the data and then processing (on the thread pool).

Instead of throttling tasks, I recommend you throttle the number of requests per target server. Good news: .NET already does this for you.

This makes your code as simple as:

private static readonly HttpClient client = new HttpClient();
public async Task Crawl(string url)
  var html = await client.GetString(url);
  var nextUrls = await Task.Run(ProcessHtml(html));
  var nextTasks = nextUrls.Select(nextUrl => Crawl(nextUrl));
  await Task.WhenAll(nextTasks);
private IEnumerable<string> ProcessHtml(string html)
  // return all urls in the html string.

which you can kick off with a simple:

await Crawl("http://example.org/");
like image 34
Stephen Cleary Avatar answered Jan 08 '23 02:01

Stephen Cleary