Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is infrequently-accessed Azure blob storage slow?

My Azure cloud service reads and writes to blobs using the .Net storage library (1.7). The blobs are in the same data centre as the service. In my first container, operations are fast (order of 10ms). In my second container they are very slow (typically about 2s or 14s, not much in between). Both are transferring the data using CloudBlob.DownloadToStream() into a MemoryStream. File sizes are typically less than 100kB.

Now I admit I haven't set up a proper test to be able to demonstrate all the above - I'm just going by my log files, so there could be some subtle difference in the way I am accessing the blobs. Apologies if this turns out to be the case.

Anyway, the only relevant difference between these two containers seems to be:

  • The fast container is accessed frequently (tens of thousands of requests per day), and the slow container quite infrequently (perhaps 200 requests per day).
  • The fast container typically stores items that are fetched soon afterwards. The slow container is often loading things that might have been stored days ago.

Question: What factors affect blob performance for infrequently-accessed blobs? What can I do to make it faster?

(I don't know how Azure blob storage is implemented, but based on the above I'm going to guess that the data is saved into a storage array and accessed via a dynamically scaling collection of VMs, each of which implements in-memory caching of blobs. Thus the ~14s delay occurs when Azure finds it needs to spin up the VMs. The ~2s delay occurs when a VM is available, but it needs to hunt down the data on a physical disk (seems rather slow), and the 10ms delay occurs when the item is stored in an in-memory cache, or something like that.)

like image 290
Oliver Bock Avatar asked Sep 03 '13 04:09

Oliver Bock


1 Answers

Windows Azure Storage is not architected how you are describing (with an expanding number of cache VMs), so there would be no impact of some data being cached and other data not being cached on the Azure Storage server side. See Windows Azure Storage Architecture Overview for a good overview, or SOSP Paper - Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency for a more in depth look.

To determine why your blob requests are slower, the first thing to do would be to determine if the slow performance is server side or client side. Fortunately Azure Storage makes this easy via the Storage Analytics (Windows Azure Storage Logging: Using Logs to Track Storage Requests) - just compare the End To End latency and the Server Latency. I suspect you will see one of two things:

  1. Low E2E and Low Server. This would indicate that either the request is getting delayed being sent from the client (ie. not enough worker threads), or your logging is providing incorrect data.
  2. High E2E and Low Server. This would indicate a problem on the client side in processing the request (not enough worker threads to process the Response, slow processing of the memory stream, etc).
like image 151
kwill Avatar answered Sep 30 '22 06:09

kwill