Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Azure CloudAppendBlob errors with concurrent access

My understanding was that the Azure CloudAppendBlob was safe from concurrency issues as you can only append to this blob storage and it does not need to compare E-tags. As stated by this post:

http://blogs.msdn.com/b/windowsazurestorage/archive/2015/04/13/introducing-azure-storage-append-blob.aspx

specifically:

In addition, Append Blob supports having multiple clients writing to the same blob without any need for synchronization (unlike block and page blob)

However the following unit test raises:

412 the append position condition specified was not met.

stack trace

Microsoft.WindowsAzure.Storage.Blob.BlobWriteStream.Flush()
Microsoft.WindowsAzure.Storage.Blob.BlobWriteStream.Commit()
Microsoft.WindowsAzure.Storage.Blob.CloudAppendBlob.UploadFromStreamHelper
Microsoft.WindowsAzure.Storage.Blob.CloudAppendBlob.AppendFromStream
Microsoft.WindowsAzure.Storage.Blob.CloudAppendBlob.AppendFromByteArray
Microsoft.WindowsAzure.Storage.Blob.CloudAppendBlob.AppendText

Here is the unit test. Maybe the service will handle requests from different contexts but not like this as a parallel?

    [TestMethod]
    public void test_append_text_concurrency()
    {
        AppendBlobStorage abs = new AppendBlobStorage(new    TestConnectConfig(), "testappendblob");

        string filename = "test-concurrent-blob";

        abs.Delete(filename);                       

        Parallel.Invoke(
            () => { abs.AppendText(filename, "message1\r\n"); },
            () => { abs.AppendText(filename, "message2\r\n"); }
        );

        string text = abs.ReadText(filename);

        Assert.IsTrue(text.Contains("message1"));
        Assert.IsTrue(text.Contains("message2"));
    }

Method in AppendBlobStorage

    public void AppendText(string filename, string text)
    {
        CloudAppendBlob cab = m_BlobStorage.BlobContainer.GetAppendBlobReference(filename);

        // Create if it doesn't exist
        if (!cab.Exists())
        {
            try
            {
                cab.CreateOrReplace(AccessCondition.GenerateIfNotExistsCondition(), null, null);
            }
            catch { }
        }

        // Append the text
        cab.AppendText(text);      
    }

Maybe I'm missing something. The reason I'm trying to do this as I have multiple web jobs which can all write to this append blob and I figured this was what it was designed for?

like image 859
James Avatar asked Sep 11 '15 18:09

James


2 Answers

the class CloudAppendBlob's append methods, includes

AppendBlock/AppendFromByteArray/AppendFromFile/AppendFromStream/AppendText

essentially they will all use this same rest api endpoint. read the document: https://learn.microsoft.com/en-us/rest/api/storageservices/append-block

But only AppendBlock should be used in multi-writer scenario, all others should be used in single-writer scenario. The reason is: AppendBlock will NOT send the header x-ms-blob-append-offset with the PUT HTTP request.

the header x-ms-blob-append-offset basically saying, MUST append this block data at this offset of the blob.

so for AppendBlock the http request looks like this:

PUT https://test.blob.core.windows.net/test/20180323.log?comp=appendblock HTTP/1.1 User-Agent: Azure-Storage/9.1.0 (.NET CLR 4.0.30319.42000; Win32NT 6.2.9200.0) x-ms-version: 2017-07-29 x-ms-client-request-id: bb7f5a93-191d-40f9-8b92-4ec0476be920 x-ms-date: Fri, 23 Mar 2018 20:21:29 GMT Authorization: SharedKey XXXXX Host: test.blob.core.windows.net Content-Length: 99

For all the other append methods, it will send the header x-ms-blob-append-offset. The value of this header should be the current length of the blob before append. so how does the library know the value? It actually will send a HEAD http request to get that information

HEAD http://test.blob.core.windows.net/test/20180323.log HTTP/1.1 User-Agent: Azure-Storage/9.1.0 (.NET CLR 4.0.30319.42000; Win32NT 6.2.9200.0) x-ms-version: 2017-07-29 x-ms-client-request-id: 1cdb3731-9d72-41ab-afee-d4f462e9b0c2 x-ms-date: Fri, 23 Mar 2018 20:29:19 GMT Authorization: SharedKey XXXX Host: test.blob.core.windows.net

the response header Content-Length's value will be the value for the header x-ms-blob-append-offset in the following PUT http request:

PUT http://test.blob.core.windows.net/test/20180323.log?comp=appendblock HTTP/1.1 User-Agent: Azure-Storage/9.1.0 (.NET CLR 4.0.30319.42000; Win32NT 6.2.9200.0) x-ms-version: 2017-07-29 x-ms-blob-condition-appendpos: 1287 x-ms-client-request-id: 1cdb3731-9d72-41ab-afee-d4f462e9b0c2 x-ms-date: Fri, 23 Mar 2018 20:29:20 GMT Authorization: SharedKey XXXXX Host: test.blob.core.windows.net Content-Length: 99

so the original question, when two parallel tasks call the AppendText at the same time, most likely, the two tasks will send the HEAD http request to get the blob's current length, which will be the same. Then task that send the PUT http request first will succeed, but the task that send the PUT http request later will fail because the blob's length already changed, and that offset has been already taken by the first PUT http request.

So if you have a multi-writer scenario, AppendBlock is the method that works right now. But you do have to be aware that

  • you will have no control of the position of the block in blob
  • the blob block has a size limit ( i think it is 4M)
  • if you use AppendBlock to upload the data more than 4M, the request will fail, with a response: HTTP/1.1 413 The request body is too large and exceeds the maximum permissible limit
  • if you use other methods except AppendBlock to upload a large data, it will send one HEAD http request to get the blob length, then automatically split the data into multiple PUT http requests. the block size can be controlled by CloudAppendBlob.StreamWriteSizeInBytes. if you don't set, it will default to 4M.
  • So as the name AppendBlock hints, it can only append one block, not more than one block. So if you want to upload a large blob, you have split the data yourself. But if you have a multi-writer scenario, you can not guarantee the splitted blocks will be together in the blob.
like image 73
liuhongbo Avatar answered Oct 17 '22 13:10

liuhongbo


After a bit more searching it looks like this is an actual problem.

I guess AppendBlobStorage is fairly new. (There are also other issues at the moment with AppendBlobStorage. see

http://blogs.msdn.com/b/windowsazurestorage/archive/2015/09/02/issue-in-azure-storage-client-library-5-0-0-and-5-0-1-preview-in-appendblob-functionality.aspx)

Anyway I fixed the issue by using the AppendBlock varient rather than AppendText as suggested here:

https://azurekan.wordpress.com/2015/09/08/issues-with-adding-text-to-azure-storage-append-blob/

The change to the appendtext method which passes the unit test defined above

    public void AppendText(string filename, string text)
    {
        if (string.IsNullOrWhiteSpace(filename))
            throw new ArgumentException("filename cannot be null or empty");

        if (!string.IsNullOrEmpty(text))
        {
            CloudAppendBlob cab = m_BlobStorage.BlobContainer.GetAppendBlobReference(filename);

            // Create if it doesn't exist
            if (!cab.Exists())
            {
                try
                {
                    cab.CreateOrReplace(AccessCondition.GenerateIfNotExistsCondition(), null, null);
                }
                catch (StorageException) { }
            }

            // use append block as append text seems to have an error at the moment.
            using (MemoryStream ms = new MemoryStream(Encoding.UTF8.GetBytes(text)))
            {
                cab.AppendBlock(ms);
            }
        }

    }
like image 35
James Avatar answered Oct 17 '22 15:10

James