I'm trying to figure out the best performing approach when writing thousands of small Blobs to Azure Storage. The application scenario is the following:
Please note that for many constraints that I'm not listing for shortness, it's not currently possible to modify the main service to directly create Blobs instead of files on Temporary file system. ...and from what I' currently seeing it would mean a slower rate of creation, not acceptable per original requirements.
This copy operation, that I'm testing in a tight loop on 10,000 files, seems to be limited at 200 blob creations per second. I've been able to reach this result after tweaking the sample code named "Windows Azure ImportExportBlob" found here: http://code.msdn.microsoft.com/windowsazure/Windows-Azure-ImportExportB-9d30ddd5 with the async suggestions found in this answer: Using Parallel.Foreach in a small azure instance
I obtained this apparent maximum of 200 blob creations per second on an extralarge VM with 8 cores and setting the "maxConcurrentThingsToProcess" Semaphore accordingly. The network utilization during the test is max 1% of the available 10Gb shown in task manager. This means roughly 100 Mb of the 800 Mb that should be available on that VM size.
I see that the total size copied during the elapsed time gives me around 10 MB/sec.
Is there some limitation on the Azure Storage traffic you can generate or should I use a different approach when writing so many and small files ?
Bandwidth and operations per blob A single blob supports up to 500 requests per second.
Blob storage is optimized for storing massive amounts of unstructured data, such as text or binary data. Blob storage is ideal for: Serving images or documents directly to a browser. Storing files for distributed access.
If source and destination are present in same storage account, please use relative path. Otherwise, maximum size of a source for copy blob operation is 50 MB.
There are two main options for physically transporting data to Azure: Azure Import/Export. The Azure Import/Export service lets you securely transfer large amounts of data to Azure Blob Storage or Azure Files by shipping internal SATA HDDs or SDDs to an Azure datacenter.
@breischl Thank you for the scalability targets. After reading that post, I started searching for more target figures possibly prepared by Microsoft and found 4 posts (too many for my "reputation" to be posted here, the other 3 are part 2, 3 and 4 of the same series):
http://blogs.microsoft.co.il/blogs/applisec/archive/2012/01/04/windows-azure-benchmarks-part-1-blobs-read-throughput.aspx
the first post contains an important hint: "You may have to increase the ServicePointManager.DefaultConnectionLimit for multiple threads to establish more than 2 concurrent connections with the storage."
I've set this to 300 , rerun the test and seen an important increase in the MB/s. As I previously wrote, I was thinking to be hitting a limit in the underlying blob service when "too many" threads are writing blobs. This is the confirmation of my worries. Thus, I removed all the changes made to the code to work with a semaphore and replaced it again with a parallel.for to start as many blob upload operations as possible. The result has been awesome: 61 MB/s writing blobs and 65 MB/s reading.
The scalability target is 60 MB/s and I'm finally happy with the result.
Thank you all again for your answers.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With