Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How many connections in HttpClient

Background

I have to download about 16k documents and the same amount of html pages from the internet. This number will increase in the future. Currently I am just using Parallel.ForEach to download and work on the data in parallel. This however does not seem to fully utilize my resources, so I am planning to bring async/await into play, to have as many downloads running in asynchronously as possible, but I will probably have to limit that.

Actual Question

How many open connections can a single HttpClient have? What other factors will I have to keep in mind when creating such an amount of connections? I am aware that I should reuse the same HttpClientand I have also read this answer, but I have doubts that I can really have several billion connections open at once.

like image 768
Jerome Reinländer Avatar asked Jul 11 '18 12:07

Jerome Reinländer


1 Answers

First, good call on switching from Parallel.ForEach to async/await. By breaking from the constraints of threads, you'll be able to increase concurrency by orders of magnitude.

I have doubts that I can really have several billion connections open at once.

Let's say you could. Do you think the job would complete any faster than if you had, say, 1000 open at once? The limitation you're going to bump up against first is bandwidth (or possibly the server refusing requests), not concurrent connections. So I would suggest the max number of connections you can possibly have open at once isn't even relevant if your goal is to complete the job as fast as possible.

That said, there are default limits imposed by .NET. Assuming you're on full framework or .NET Core 2.x, the limit can be changed programatically via ServicePointManager.DefaultConnectionLimit, which has a default value of just 2. Set it to something much bigger.

Next I would suggest setting up your code to perform the downloads concurrently up to some limit, using either SemaphoreSlim or TPL Dataflow. Both approaches are well covered in answers to this question. Then start experimenting until you come up with an optimal number. Hard to say what that is. Maybe start with 50. If it goes well, increase it to 100 and see if the overall job completes any faster. If you start getting socket exceptions or errors returned from the server, dial it down.

like image 135
Todd Menier Avatar answered Oct 04 '22 23:10

Todd Menier