Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to efficiently make 1000s of web requests as quickly as possible

I need to make 100,000s of lightweight (i.e. small Content-Length) web requests from a C# console app. What is the fastest way I can do this (i.e. have completed all the requests in the shortest possible time) and what best practices should I follow? I can't fire and forget because I need to capture the responses.

Presumably I'd want to use the async web requests methods, however I'm wondering what the impact of the overhead of storing all the Task continuations and marshalling would be.

Memory consumption is not an overall concern, the objective is speed.

Presumably I'd also want to make use of all the cores available.

So I can do something like this:

Parallel.ForEach(iterations, i =>
{
    var response = await MakeRequest(i);
    // do thing with response
});

but that won't make me any faster than just my number of cores.

I can do:

Parallel.ForEach(iterations, i =>
{
    var response = MakeRequest(i);
    response.GetAwaiter().OnCompleted(() =>
    {
        // do thing with response
    });
});

but how do I keep my program running after the ForEach. Holding on to all the Tasks and WhenAlling them feels bloated, are there any existing patterns or helpers to have some kind of Task queue?

Is there any way to get any better, and how should I handle throttling/error detection? For instance, if the remote endpoint is slow to respond I don't want to continue spamming it.

I understand I also need to do:

ServicePointManager.DefaultConnectionLimit = int.MaxValue

Anything else necessary?

like image 849
Andrew Bullock Avatar asked Dec 24 '15 18:12

Andrew Bullock


2 Answers

The Parallel class does not work with async loop bodies so you can't use it. Your loop body completes almost immediately and returns a task. There is no parallelism benefit here.

This is a very easy problem. Use one of the standard solutions for processing a series of items asynchronously with a given DOP (this one is good: http://blogs.msdn.com/b/pfxteam/archive/2012/03/05/10278165.aspx. Use the last piece of code).

You need to empirically determine the right DOP. Simply try different values. There is no theoretical way to derive the best value because it is dependent on many things.

The connection limit is the only limit that's in your way.

response.GetAwaiter().OnCompleted

Not sure what you tried to accomplish there... If you comment I'll explain the misunderstanding.

like image 90
usr Avatar answered Sep 20 '22 06:09

usr


The operation you want to perform is

  1. Call an I/O method
  2. Process the result

You are correct that you should use an async version of the I/O method. What's more, you only need 1 thread to start all of the I/O operations. You will not benefit from parallelism here.

You will benefit from parallelism in the second part - processing the result, as this will be a CPU-bound operation. Luckily, async/await will do all the job for you. Console applications don't have a synchronization context. It means that the part of the method after an await will run on a thread pool thread, optimally utilizing all CPU cores.

private async Task MakeRequestAndProcessResult(int i)
{
    var result = await MakeRequestAsync();
    ProcessResult(result);
}

var tasks = iterations.Select(i => MakeRequestAndProcessResult(i)).ToArray();

To achieve the same behavior in an environment with a synchronization context (for example WPF or WinForms), use ConfigureAwait(false).

var result = await MakeRequestAsync().ConfigureAwait(false);

To wait for the tasks to complete, you can use await Task.WhenAll(tasks) inside an async method or Task.WaitAll(tasks) in Main().

Throwing 100k requests at a web service will probably kill it, so you will have to limit it. You can check answers to this question to find some options how to do it.

like image 44
Jakub Lortz Avatar answered Sep 20 '22 06:09

Jakub Lortz