Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"The specified block list is invalid" while uploading blobs in parallel

I've a (fairly large) Azure application that uploads (fairly large) files in parallel to Azure blob storage.

In a few percent of uploads I get an exception:

The specified block list is invalid.

System.Net.WebException: The remote server returned an error: (400) Bad Request.

This is when we run a fairly innocuous looking bit of code to upload a blob in parallel to Azure storage:

    public static void UploadBlobBlocksInParallel(this CloudBlockBlob blob, FileInfo file) 
    {
        blob.DeleteIfExists();
        blob.Properties.ContentType = file.GetContentType();
        blob.Metadata["Extension"] = file.Extension;

        byte[] data = File.ReadAllBytes(file.FullName);

        int numberOfBlocks = (data.Length / BlockLength) + 1;
        string[] blockIds = new string[numberOfBlocks];

        Parallel.For(
            0, 
            numberOfBlocks, 
            x =>
        {
            string blockId = Convert.ToBase64String(Guid.NewGuid().ToByteArray());
            int currentLength = Math.Min(BlockLength, data.Length - (x * BlockLength));

            using (var memStream = new MemoryStream(data, x * BlockLength, currentLength))
            {
                var blockData = memStream.ToArray();
                var md5Check = System.Security.Cryptography.MD5.Create();
                var md5Hash = md5Check.ComputeHash(blockData, 0, blockData.Length);

                blob.PutBlock(blockId, memStream, Convert.ToBase64String(md5Hash));
            }

            blockIds[x] = blockId;
        });

        byte[] fileHash  = _md5Check.ComputeHash(data, 0, data.Length);
        blob.Metadata["Checksum"] = BitConverter.ToString(fileHash).Replace("-", string.Empty);
        blob.Properties.ContentMD5 = Convert.ToBase64String(fileHash);

        data = null;
        blob.PutBlockList(blockIds);
        blob.SetMetadata();
        blob.SetProperties();
    }

All very mysterious; I'd think the algorithm we're using to calculate the block list should produce strings that are all the same length...

like image 255
Jeremy McGee Avatar asked Oct 16 '12 15:10

Jeremy McGee


2 Answers

We ran into a similar issue, however we were not specifying any block ID or even using the block ID anywhere. In our case, we were using:

using (CloudBlobStream stream = blob.OpenWrite(condition))
{
   //// [write data to stream]

   stream.Flush();
   stream.Commit();
}

This would cause The specified block list is invalid. errors under parallelized load. Switching this code to use the UploadFromStream(…) method while buffering the data into memory fixed the issue:

using (MemoryStream stream = new MemoryStream())
{
   //// [write data to stream]

   stream.Seek(0, SeekOrigin.Begin);
   blob.UploadFromStream(stream, condition);
}

Obviously this could have negative memory ramifications if too much data is buffered into memory, but this is a simplification. One thing to note is that UploadFromStream(...) uses Commit() in some cases, but checks additional conditions to determine the best method to use.

like image 88
Andacious Avatar answered Sep 22 '22 17:09

Andacious


This exception can happen also when multiple threads open stream into a blob with the same file name and try to write into this blob simultaneously.

like image 32
Martin Staufcik Avatar answered Sep 21 '22 17:09

Martin Staufcik