I have to download about 16k documents and the same amount of html pages from the internet. This number will increase in the future. Currently I am just using Parallel.ForEach
to download and work on the data in parallel. This however does not seem to fully utilize my resources, so I am planning to bring async/await
into play, to have as many downloads running in asynchronously as possible, but I will probably have to limit that.
How many open connections can a single HttpClient
have? What other factors will I have to keep in mind when creating such an amount of connections? I am aware that I should reuse the same HttpClient
and I have also read this answer, but I have doubts that I can really have several billion connections open at once.
First, good call on switching from Parallel.ForEach
to async/await
. By breaking from the constraints of threads, you'll be able to increase concurrency by orders of magnitude.
I have doubts that I can really have several billion connections open at once.
Let's say you could. Do you think the job would complete any faster than if you had, say, 1000 open at once? The limitation you're going to bump up against first is bandwidth (or possibly the server refusing requests), not concurrent connections. So I would suggest the max number of connections you can possibly have open at once isn't even relevant if your goal is to complete the job as fast as possible.
That said, there are default limits imposed by .NET. Assuming you're on full framework or .NET Core 2.x, the limit can be changed programatically via ServicePointManager.DefaultConnectionLimit
, which has a default value of just 2. Set it to something much bigger.
Next I would suggest setting up your code to perform the downloads concurrently up to some limit, using either SemaphoreSlim
or TPL Dataflow. Both approaches are well covered in answers to this question. Then start experimenting until you come up with an optimal number. Hard to say what that is. Maybe start with 50. If it goes well, increase it to 100 and see if the overall job completes any faster. If you start getting socket exceptions or errors returned from the server, dial it down.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With