Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Azure blob's block list is empty, but blob is not empty! How can this be?

This issue in a nutshell:

A block blob can be created with a single PUT request. This will create a blob with committed content but the blob will not have any committed blocks!

This means that you cannot assume that the concatenation of committed blocks is the same as the committed content.

When working with block blobs you'll have to pay extra attention to blobs with empty block lists, because such blobs may or may not be empty!


The original question:

One of our storage blobs in an Azure account has an empty block list, although it is non-empty.

I'm retrieving the block list like this (C#):

foreach (var block in _cloudBlob.DownloadBlockList(
    BlockListingFilter.Committed, 
    AccessCondition.GenerateLeaseCondition(_leaseId)))
{
    // ...
}

The code in the foreach block is NOT executed. The returned list is empty.

However, the blob reports that it has a non-zero length when I check: _cloudBlob.Properties.Length

I can also download the blob and see that it is not empty.

Am I missing something? How can the block list be empty when the blob is not?!

It does not matter whether I use BlockListingFilter.Committed, BlockListingFilter.Uncommitted or BlockListingFilter.All; the list is still empty!

UPDATE

I have copied this blob to a public container so that this issue can be reproduced by anyone.

Here's how to reproduce what I'm unable to understand:

First get blob properties from Azure using the REST API:

HEAD http://dfdev.blob.core.windows.net/pub/test HTTP/1.1
Host: dfdev.blob.core.windows.net

Response:

HTTP/1.1 200 OK
Content-Length: 66
Content-Type: application/octet-stream
Last-Modified: Sat, 02 Feb 2013 09:37:19 GMT
ETag: 0x8CFCF40075A5F31
Server: Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0
x-ms-request-id: 4b149a7e-2fcd-4ab4-8d53-12ef047cbfa1
x-ms-version: 2009-09-19
x-ms-lease-status: unlocked
x-ms-blob-type: BlockBlob
Date: Sat, 02 Feb 2013 09:40:54 GMT

The response headers tell us that this is a block blob and that it has a length of 66 bytes.

Now retrieve the block list from:

http://dfdev.blob.core.windows.net/pub/test?comp=blocklist

Response body:

<?xml version="1.0" encoding="utf-8"?><BlockList><CommittedBlocks /></BlockList>

So, the blob does not have any committed blocks, still it has a length of 66 bytes!

Is this a bug or have I misunderstood something?

Please help me out!

UPDATE 2

I've found that if I upload the blob like this:

container.GetBlockBlobReference("put-only")
    .UploadFromStream(File.OpenRead("test-blob"));

...then a single PUT request is sent to Azure and the blob gets an empty block list (just like above).

However, if I upload the blob like this:

var blob = container.GetBlockBlobReference("put-block");
string blockId = Convert.ToBase64String(Guid.NewGuid().ToByteArray());
blob.PutBlock(blockId, File.OpenRead("test-blob"), null);
blob.PutBlockList(new string[] { blockId });

...then two requests are sent to Azure (one for putting the block and another for putting the block list).

The second blob gets a non-empty block list.

Why won't a single PUT yield a block list?

Can't we rely on that the concatenation of a blob's committed blocks are equal to the blob's actual content?!

If not, how shall we determine when the block list is OK and when it's not??

UPDATE 3

I've implemented a workaround for this that I think suffice in the case where we encountered this problem. In case we discover an empty block list AND a blob length that is greater than zero, then we'll assume that everything is OK (although it really isn't) and go ahead and rewrite that data using Put Block and Put Block List at the next opportunity.

However, although this will do the trick in our case, it is still very confusing that a non-empty block blob can have an empty list of committed blocks!!

Is this by-design in Azure? Can anyone explain what's going on?

UPDATE 4

Microsoft confirmed this issue on the MSDN forums too. Quote from Allen Chen:

I've confirmed with the product team. This is a normal behavior. The x-ms-blob-content-length header is the size of the committed blob. In your case you use Put Blob API so all content is uploaded in a single API and is committed in the same request. As a result in the Get Block List API's response you see the x-ms-blob-content-length header has value of 66 which means the committed blob size.

We have been aware of the issue that the MSDN document of the Get Block List API is not quite clear on this and will work on it.

like image 861
Mårten Wikström Avatar asked Feb 01 '13 17:02

Mårten Wikström


People also ask

Is BLOB storage same as block storage?

While the block storage options available in Azure can be used for storage capacity in more traditional constructs, such as hard disks for virtual machines, object storage via Azure Blob storage enables more economical data storage for specific data sets.

Why is an empty file with the name of folder inside a Azure blob storage container is created?

You get this because the Azure storage you are mounting does not have a hierarchical file system. For example, the mount is a blob storage of type StorageV2 but you have not ticked the Use hierarchical filesystem at creation time.

What is the maximum number of blocks you can have in block blob?

A block blob can include up to 50,000 blocks.

What is the difference between block blob and append blob?

Block blobs are composed of blocks and are ideal for storing text or binary files, and for uploading large files efficiently. Append blobs are also made up of blocks, but they are optimized for append operations, making them ideal for logging scenarios.


1 Answers

As you also identified with your tests, querying the list of blocks of a block blob uploaded using Put Blob will return an empty list. This is by design.

UploadFromStream API in the Storage Client Library makes a couple of checks before deciding whether to upload a blob using a single Put Blob operation or a sequence of Put Block operations followed by a Put Block List. One property that changes this behavior is SingleBlobUploadThresholdInBytes.

like image 54
Serdar Ozler Avatar answered Oct 20 '22 12:10

Serdar Ozler