Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C# Download data from huge list of urls [duplicate]

I have a huge list of web pages which display a status, which i need to check. Some urls are within the same site, another set is located on another site.

Right now i'm trying to do this in a parallel way by using code like below, but i have the feeling that i'm causing too much overhead.

while(ListOfUrls.Count > 0){
  Parallel.ForEach(ListOfUrls, url =>
  {
    WebClient webClient = new WebClient();
    webClient.DownloadString(url);
    ... run my checks here.. 
  });

  ListOfUrls = GetNewUrls.....
}

Can this be done with less overhead, and some more control over how many webclients and connections i use/reuse? So, that in the end the job can be done faster?

like image 390
Tys Avatar asked Mar 22 '23 00:03

Tys


2 Answers

Parallel.ForEach is good for CPU-bound computational tasks, but it will unnecessary block pool threads for synchronous IO-bound calls like DownloadString in your case. You can improve the scalability of your code and reduce the number of threads it may use, by using DownloadStringTaskAsync and tasks instead:

// non-blocking async method
async Task<string> ProcessUrlAsync(string url)
{
    using (var webClient = new WebClient())
    {
        string data = await webClient.DownloadStringTaskAsync(new Uri(url));
        // run checks here.. 
        return data;
    }
}

// ...

if (ListOfUrls.Count > 0) {
    var tasks = new List<Task>();
    foreach (var url in ListOfUrls)
    {
      tasks.Add(ProcessUrlAsync(url));
    }

    Task.WaitAll(tasks.ToArray()); // blocking wait

    // could use await here and make this method async:
    // await Task.WhenAll(tasks.ToArray());
}
like image 127
noseratio Avatar answered Apr 01 '23 17:04

noseratio


you can try using HttpClient a new addition in .Net 4.5 it consider to be be faster and it might improve your performance a little

using (HttpClient client = new HttpClient())
using (HttpResponseMessage response = await client.GetAsync(url))
using (HttpContent content = response.Content)
{

    string result = await content.ReadAsStringAsync();


}
like image 31
COLD TOLD Avatar answered Apr 01 '23 16:04

COLD TOLD