I have a list of 100 urls. I need to fetch the html content of those urls. Lets say I don't use the async version of DownloadString and instead do the following.
var task1 = SyTask.Factory.StartNew(() => new WebClient().DownloadString("url1"));
What I want to achieve is to get the html string for at max 4 urls at a time.
I start 4 tasks for the first four urls. Assume the 2nd url completes, I want to immediately start the 5th task for the 5th url. And so on. This way at max 4 only 4 urls will be downloaded, and for all purposes there will always be 4 urls being downloaded, ie till all 100 are processed.
I can't seem to visualize how will I actually achieve this. There must be an established pattern for doing this. Thoughts?
EDIT:
Following up on @Damien_The_Unbeliever's comment to use Parallel.ForEach, I wrote the following
var urls = new List<string>();
var results = new Dictionary<string, string>();
var lockObj = new object();
Parallel.ForEach(urls,
new ParallelOptions { MaxDegreeOfParallelism = 4 },
url =>
{
var str = new WebClient().DownloadString(url);
lock (lockObj)
{
results[url] = str;
}
});
I think the above reads better than creating individual tasks and using a semaphore to limit concurrency. That said having never used or worked with Parallel.ForEach, I am unsure if this correctly does what I need to do.
SemaphoreSlim sem = new SemaphoreSlim(4);
foreach (var url in urls)
{
sem.Wait();
Task.Factory.StartNew(() => new WebClient().DownloadString(url))
.ContinueWith(t => sem.Release());
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With