Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Async/Await regarding system resources consumption and efficiency

Short version: how does async calls scale when async methods are called thousands and thousands of times in a loop, and these methods might call other async methods? Will my threadpool explode?

I've been reading and experimenting with the TPL and Async and after reading a lot of material I'm still confused about some aspects that I could not find much information about, like how async calls scale. I will try to go straight to the point.

Async calls
For IO, I read it is better to use async than a new thread/start a task, but from what I understand, performing an async operation without using a different thread is impossible, which means async must use other threads/start tasks at some point. So my question is: how would code A be better than code B regarding system resources?

Code A

// an array with 5000 urls.
var urls = new string[5000];

// list of awaitable tasks.
var tasks = new List<Task<string>>(5000);

HttpClient httpClient;

foreach (string url in urls)
{
    tasks.Add(httpClient.GetStringAsync(url));
}

await Task.WhenAll(tasks);

Code B

...same variables as code A...

foreach (string url in urls)
{
    tasks.Add(
              Task.Factory.StartNew(() =>
              {
                // This method represents a
                // synchronous version of the GetStringAsync.
                httpClient.GetString(url);
              })
             );
}

await Task.WhenAll(tasks);

Which leads me to the questions:
1 - should async calls be avoided in a loop?
2 - Is there a reasonable max of async calls that should be fired at a time, or is firing any number of async calls ok? How does this scale?
3 - Do async methods, under the hood, start a task for each call?

I tested this with 1000 urls and the number of used threadpool worker threads never even reached 30, and the number of IO completion threads is always about 5.

My Practical Experiment

I created a web application with a simple async controller. The page is composed of a single form with a textarea where the user enters all urls he wishes to request/do some work with.

Upon submition, the urls are requested in loop using the HttpClient.GetUrlAsync method just like the code A above.

An interesting point is that if I submit 1000 urls, it takes about 3 minutes to finish all requests.

On the other hand, if I submit 3 forms from 3 different tabs (i.e. clients), each with 1000 urls, it takes much much longer for the result (about 10 minutes), which really got me confused, because as per msdn definition, it should not take much longer than 3 minutes, specially when even while processing all the requests at the same time the number of used threads from the threadpool is approx 25, which means resources are not being well explored at all!

The way it is working now, this type of application is far from scalable (say I had about 5000 clients requesting a bunch of urls all the time), and I fail to see how asyncis the way to fire multiple IO requests.

Further explanation about the application

Client side:
1. user enter the site
2. types 1000 urls in the text area
3. submits the urls

Server side:
1. receive urls as an array
2. perform the code

foreach (string url in urls)
{
    tasks.Add(GetUrlAsync(url));
}

await Task.WhenAll(tasks);
//at this point the thread is
// returned to the pool to receive
// further requests.
  1. notifies the client that work is done

Please, enlighten me! Thank you.

like image 425
victor Avatar asked Sep 16 '15 22:09

victor


1 Answers

from what I understand, performing an async operation without using a different thread is impossible, which means async must use other threads/start tasks at some point.

Nope. As I describe on my blog, pure async methods do not block threads.

So my question is: how would code A be better than code B regarding system resources?

A uses fewer threads than B.

(On a side note, do not use StartNew. It's horribly out-of-date and has very dangerous default parameter values. Use Task.Run instead. If you got this idea/code from a blog post or article, please pass the word along. StartNew is a cancer that seems to be taking over the Internet.)

should async calls be avoided in a loop?

Nope, that's fine.

Is there a reasonable max of async calls that should be fired at a time, or is firing any number of async calls ok?

Any number of them are fine, as long as your backend resource can handle it.

How does this scale?

Asynchronous I/O on .NET almost always uses IOCPs (I/O Completion Ports) underneath, which is generally considered the most scalable form of I/O available on Windows.

Do async methods, under the hood, start a task for each call?

Yes and no. The execution of every asynchronous method is represented by a Task instance, but these do not represent running tasks - they don't represent a thread.

I call async tasks Promise Tasks, as opposed to Delegate Tasks (tasks that actually do run on the thread pool).

really got me confused

One thing to be aware of when you're testing URL requests is that there's automatic throttling for URL requests built-in to .NET. Try setting ServicePointManager.DefaultConnectionLimit to int.MaxValue.

like image 122
Stephen Cleary Avatar answered Oct 23 '22 22:10

Stephen Cleary