Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Trying to download huge amount of files more efficiently

I'm trying to download approx. 45.000 image files from an API. The image files have less than 50kb each. With my code this will take 2-3 Hours.

Is there an more efficient way in C# to download them?

private static readonly string baseUrl =
    "http://url.com/Handlers/Image.ashx?imageid={0}&type=image";
internal static void DownloadAllMissingPictures(List<ListObject> ImagesToDownload,
    string imageFolderPath)
{
    Parallel.ForEach(Partitioner.Create(0, ImagesToDownload.Count), range =>
    {
        for (var i = range.Item1; i < range.Item2; i++)
        {
            string ImageID = ImagesToDownload[i].ImageId;

            using (var webClient = new WebClient())
            {
                string url = String.Format(baseUrl, ImageID);
                string file = String.Format(@"{0}\{1}.jpg", imageFolderPath,
                    ImagesToDownload[i].ImageId);

                byte[] data = webClient.DownloadData(url);

                using (MemoryStream mem = new MemoryStream(data))
                {
                    using (var image = Image.FromStream(mem))
                    {
                        image.Save(file, ImageFormat.Jpeg);
                    }
                }                    
            }
        }
    });
}
like image 493
Smutjes Avatar asked Dec 30 '25 17:12

Smutjes


1 Answers

I tested some variations of your suggestions. The Code by Theodor Zoulias was my favourite.

It works fine and fast with approx 1.200 downloads per Minute.

This is the final Code I'm using now:

    private static readonly string _baseUrlPattern = "http://url.com/Handlers/Image.ashx?imageId={0}&type=card";

    private static readonly HttpClient _httpClient = new HttpClient();

    internal static void DownloadAllMissingPictures(CancellationToken cancellationToken = default)
    {
        ServicePointManager.DefaultConnectionLimit = 8;

        var parallelOptions = new ParallelOptions()
        {
            MaxDegreeOfParallelism = 10,
            CancellationToken = cancellationToken,
        };
        Parallel.ForEachAsync(ListWithImagesToDownload, parallelOptions, async (image, ct) =>
        {
            string imageId = image.identifiers.ImageId;
            string url = String.Format(_baseUrlPattern, imageId);
            string filePath = Path.Combine(imageFolderPath, $"{imageId}.jpg");

            using HttpResponseMessage response = await _httpClient.GetAsync(url, ct);
            response.EnsureSuccessStatusCode();

            using FileStream fileStream = File.OpenWrite(filePath);
            await response.Content.CopyToAsync(fileStream);
        }).Wait();
    }

The Code Idea by TomTom is fine, but stops after one loop. So I can't tell you which impact the MaxConnectionsPerServer has on the Download speed.

I'm sorry I can't share some experience with you too. But as I said, I'm still a beginner with less than one year of programming experience.

like image 165
Smutjes Avatar answered Jan 02 '26 17:01

Smutjes



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!