Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CloudBlockBlob.DownloadToStream vs DownloadRangeToStream

Trying to use the ASP.NET azure SDK for downloading images from blob storage..

I read in another post that DownloadToStream does break blobs up into smaller pieces and downloads them in parallel in order to increase performance. I believe this is what DownloadRangeToStream is for.

I have not been able to find any documentation or code confirming this statement about DownloadToStream, and am skeptical because it has the same runtime as just downloading straight from the blob url (.5-3s per download). Here is the code for both my download methods, giving about the same performance.

Using CloudBlockBlob.DownloadToStream:

private Bitmap DownloadFromBlob(String set) {

    CloudStorageAccount storageAccount = CloudStorageAccount.Parse( CloudConfigurationManager.GetSetting("StorageConnectionString"));

    CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();

    CloudBlobContainer container = blobClient.GetContainerReference("templates");

    CloudBlockBlob blockBlob = container.GetBlockBlobReference(set + ".png");

    using (var memoryStream = new MemoryStream()) {
        blockBlob.DownloadToStream(memoryStream);

        return (memoryStream == null) ? null : (Bitmap)Image.FromStream(memoryStream);
    }
}

Using Image.FromStream:

private Bitmap DownloadImageFromUrl(string url) {
    try {
        using (WebClient client = new WebClient()) {
            byte[] data = client.DownloadData(url);
            using (MemoryStream mem = (data == null) ? null : new MemoryStream(data)) {
                return (data == null || mem == null) ? null : (Bitmap)Image.FromStream(mem);
            }
        }
    } catch (WebException e) {
        return null;
    }
}

I am trying to increase the download time of images that range from .5-12 MB. I tried to implement my own DownloadRangeToStream method for these images, the code for that is below. Do I need to do this or does DownloadToStream do it for me already? This method yields the same runtime as the DownloadFromBlob method above..

Using downloadRangeToStream:

private Image getImageFromStream(string set)
    {
        CloudStorageAccount storageAccount = CloudStorageAccount.Parse(
        CloudConfigurationManager.GetSetting("StorageConnectionString"));

        CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();

        CloudBlobContainer container = blobClient.GetContainerReference("templates");

        CloudBlockBlob blockBlob = container.GetBlockBlobReference(set + ".png");

        using (MemoryStream ms = new MemoryStream())
        {

            ParallelDownloadBlob(ms, blockBlob);
            return (ms == null) ? null : Image.FromStream(ms);
        }
    }
private static void ParallelDownloadBlob(Stream outPutStream, CloudBlockBlob blob)
    {
        blob.FetchAttributes();
        int bufferLength = 1 * 1024 * 1024;//1 MB chunk
        long blobRemainingLength = blob.Properties.Length;
        Queue<KeyValuePair<long, long>> queues = new Queue<KeyValuePair<long, long>>();
        long offset = 0;
        while (blobRemainingLength > 0)
        {
            long chunkLength = (long)Math.Min(bufferLength, blobRemainingLength);
            queues.Enqueue(new KeyValuePair<long, long>(offset, chunkLength));
            offset += chunkLength;
            blobRemainingLength -= chunkLength;
        }
        Parallel.ForEach(queues,
            new ParallelOptions()
            {
        //Gets or sets the maximum number of concurrent tasks
        MaxDegreeOfParallelism = 10
            }, (queue) =>
            {
                using (var ms = new MemoryStream())
                {
                    blob.DownloadRangeToStream(ms, queue.Key, queue.Value);
                    lock (outPutStream)
                    {
                        outPutStream.Position = queue.Key;
                        var bytes = ms.ToArray();
                        outPutStream.Write(bytes, 0, bytes.Length);
                    }
                }
            });
    }
like image 778
Jaked222 Avatar asked Jan 23 '17 15:01

Jaked222


People also ask

Does downloadtostream work with large files?

Note that if you use the first option, it will not work with large files if you simply call DownloadToStream or similar: Why not? Well, the way this code works is that it downloads the large file to the web server, then sends it to the browser.

What is Azure Blob download to stream?

Microsoft. Azure. Storage. Blob Cloud Blob. Download ToStream Method Microsoft. Azure. Storage. Blob Downloads the contents of a blob to a stream. A Stream object representing the target stream. An AccessCondition object that represents the condition that must be met in order for the request to proceed. If null, no condition is used.

How many blobs does cloudblockblob download per chunk?

Each chunk is stored as a separate blob in a blob container. The application uses CloudBlockBlob.DownloadToStreamAsync (Stream target, CancellationToken cancellationToken) to download the blobs. To increase throughput it downlaods 20 blobs in parallel. Occasionally a single call to CloudBlockBlob.DownloadToStreamAsync () hangs.

How often does cloudblockblob downloadtostreamasync hang?

Occasionally a single call to CloudBlockBlob.DownloadToStreamAsync () hangs. It can happen after as few as 100 downloaded blobs, but can also not happen even after thousands of successfully downloaded blobs. Using Blob container logs and the Azure SDK log I managed to detect a pattern in the logs for those hangs.


1 Answers

Per my understanding, both CloudBlockBlob.DownloadToStream and Image.FromStream would only send a request to download the stream, you could leverage Fiddler to capture the traffic as follows:

When using DownloadRangeToStream, you could break your blob up into smaller pieces and download them in parallel by yourself in order to increase performance. Here is my code snippet, you could refer to it.

private static void ParallelDownloadBlob(Stream outPutStream, CloudBlockBlob blob)
{
    blob.FetchAttributes();
    int bufferLength = 1 * 1024 * 1024;//1 MB chunk
    long blobRemainingLength = blob.Properties.Length;
    Queue<KeyValuePair<long, long>> queues = new Queue<KeyValuePair<long, long>>();
    long offset = 0;
    while (blobRemainingLength > 0)
    {
        long chunkLength = (long)Math.Min(bufferLength, blobRemainingLength);
        queues.Enqueue(new KeyValuePair<long, long>(offset, chunkLength));
        offset += chunkLength;
        blobRemainingLength -= chunkLength;
    }
    Parallel.ForEach(queues,
        new ParallelOptions()
        {   
            //Gets or sets the maximum number of concurrent tasks
            MaxDegreeOfParallelism = 10
        }, (queue) =>
            {
                using (var ms = new MemoryStream())
                {
                    blob.DownloadRangeToStream(ms, queue.Key, queue.Value);
                    lock (outPutStream)
                    {
                        outPutStream.Position = queue.Key;
                        var bytes = ms.ToArray();
                        outPutStream.Write(bytes, 0, bytes.Length);
                    }
                }
            });
}

Result:

Additionally, there are some blogs about upload/download blob in parallel, you could refer to them (blog1 and blog2).

like image 117
Bruce Chen Avatar answered Sep 30 '22 06:09

Bruce Chen