Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

More effecient way of calling GetStringAsync multiple times?

Tags:

I have (my url list is about 1000 urls), I was wondering if there is a more effecient call multiple urls from same site (already changing the ServicePointManager.DefaultConnectionLimit).

Also is it better to reuse the same HttpClient or create new one on every call, below uses just one instead of multiple.

using (var client = new HttpClient { Timeout = new TimeSpan(0, 5, 0) })
{
    var tasks = urls.Select(async url =>
    {
        await client.GetStringAsync(url).ContinueWith(response =>
        {
           var resultHtml = response.Result;
           //process the html

        });
    }).ToList();

    Task.WaitAll(tasks.ToArray());
}

as suggested by @cory
here is the modified code using TPL, however i have to set the MaxDegreeOfParallelism = 100 to achieve approx same speed as the Task based, can the below code be improved?

var downloader = new ActionBlock<string>(async url =>
{
    var client = new WebClient();
    var resultHtml = await client.DownloadStringTaskAsync(new Uri(url));


}, new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 100 });

foreach(var url in urls)
{
    downloader.Post(url);
}
downloader.Complete();
downloader.Completion.Wait();

FINAL

public void DownloadUrlContents(List<string> urls)
{
    var watch = Stopwatch.StartNew();

    var httpClient = new HttpClient();
    var downloader = new ActionBlock<string>(async url =>
    {
        var data = await httpClient.GetStringAsync(url);
    }, new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 100 });

    Parallel.ForEach(urls, (url) =>
    {
        downloader.SendAsync(url);
    });
    downloader.Complete();
    downloader.Completion.Wait();

    Console.WriteLine($"{MethodBase.GetCurrentMethod().Name} {watch.Elapsed}");    
}
like image 335
Zoinky Avatar asked Mar 08 '17 02:03

Zoinky


2 Answers

Though your code will work, it's a common practice to introduce a buffer block for your ActionBlock. Why to do this? First reason is task queue size, you can easily level the messages count in your queue. Second reason is that adding the message to buffer is almost instant, and after that it's TPL Dataflow' responsibility to handle all your items:

// async method here
public async Task DownloadUrlContents(List<string> urls)
{
    var watch = Stopwatch.StartNew();

    var httpClient = new HttpClient();

    // you may limit the buffer size here
    var buffer = new BufferBlock<string>();
    var downloader = new ActionBlock<string>(async url =>
        {
            var data = await httpClient.GetStringAsync(url);
            // handle data here
        },
        // note processot count usage here
        new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = Environment.ProcessorCount });
    // notify TPL Dataflow to send messages from buffer to loader
    buffer.LinkTo(downloader, new DataflowLinkOptions {PropagateCompletion = true});

    foreach (var url in urls)
    {
        // do await here
        await buffer.SendAsync(url);
    }
    // queue is done
    buffer.Complete();

    // now it's safe to wait for completion of the downloader
    await downloader.Completion;

    Console.WriteLine($"{MethodBase.GetCurrentMethod().Name} {watch.Elapsed}");
}
like image 86
VMAtm Avatar answered Sep 24 '22 09:09

VMAtm


Essentially, re-using the HttpClient is better, because you don't have to authenticate every single time you send a request, and you can save the state of a session using cookies, unless you initialize it with a token/cookies on every creation. Other than that, it all comes down to ServicePoint, where you can set the maximum allowed number of concurrent connections.

To do calls in parallel in more maintainable way, I would suggest to use the AsyncEnumerator NuGet package, which allows you to write a code like this:

using System.Collections.Async;

await uris.ParallelForEachAsync(
    async uri =>
    {
        var html = await httpClient.GetStringAsync(uri, cancellationToken);
        // process HTML
    },
    maxDegreeOfParallelism: 5,
    breakLoopOnException: false,
    cancellationToken: cancellationToken);
like image 25
Serge Semenov Avatar answered Sep 25 '22 09:09

Serge Semenov