I have a list of URLs of pages I want to download concurrently using HttpClient
. The list of URLs can be large (100 or more!)
I have currently have this code:
var urls = new List<string>
{
@"http:\\www.amazon.com",
@"http:\\www.bing.com",
@"http:\\www.facebook.com",
@"http:\\www.twitter.com",
@"http:\\www.google.com"
};
var client = new HttpClient();
var contents = urls
.ToObservable()
.SelectMany(uri => client.GetStringAsync(new Uri(uri, UriKind.Absolute)));
contents.Subscribe(Console.WriteLine);
The problem: due to the usage of SelectMany
, a big bunch of Tasks are created almost at the same time. It seems that if the list of URLs is big enough, a lot Tasks give timeouts (I'm getting "A Task was cancelled" exceptions).
So, I thought there should be a way, maybe using some kind of Scheduler, to limit the number of concurrent Tasks, not allowing more than 5 or 6 at a given time.
This way I could get concurrent downloads without launching too many tasks that may get stall, like they do right now.
How to do that so I don't saturate with lots of timed-out Tasks?
Remember SelectMany()
is actually Select().Merge()
. While SelectMany
does not have a maxConcurrent
paramter, Merge()
does. So you can use that.
From your example, you can do this:
var urls = new List<string>
{
@"http:\\www.amazon.com",
@"http:\\www.bing.com",
@"http:\\www.facebook.com",
@"http:\\www.twitter.com",
@"http:\\www.google.com"
};
var client = new HttpClient();
var contents = urls
.ToObservable()
.Select(uri => Observable.FromAsync(() => client.GetStringAsync(uri)))
.Merge(2); // 2 maximum concurrent requests!
contents.Subscribe(Console.WriteLine);
Here is an example of how you can do it with the DataFlow API:
private static Task DoIt()
{
var urls = new List<string>
{
@"http:\\www.amazon.com",
@"http:\\www.bing.com",
@"http:\\www.facebook.com",
@"http:\\www.twitter.com",
@"http:\\www.google.com"
};
var client = new HttpClient();
//Create a block that takes a URL as input
//and produces the download result as output
TransformBlock<string,string> downloadBlock =
new TransformBlock<string, string>(
uri => client.GetStringAsync(new Uri(uri, UriKind.Absolute)),
new ExecutionDataflowBlockOptions
{
//At most 2 download operation execute at the same time
MaxDegreeOfParallelism = 2
});
//Create a block that prints out the result
ActionBlock<string> doneBlock =
new ActionBlock<string>(x => Console.WriteLine(x));
//Link the output of the first block to the input of the second one
downloadBlock.LinkTo(
doneBlock,
new DataflowLinkOptions { PropagateCompletion = true});
//input the urls into the first block
foreach (var url in urls)
{
downloadBlock.Post(url);
}
downloadBlock.Complete(); //Mark completion of input
//Allows consumer to wait for the whole operation to complete
return doneBlock.Completion;
}
static void Main(string[] args)
{
DoIt().Wait();
Console.WriteLine("Done");
Console.ReadLine();
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With