Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why upload to Azure blob so slow?

I have a custom stream that is used to perform write operations directly into the page cloud blob.

public sealed class WindowsAzureCloudPageBlobStream : Stream
{
    // 4 MB is the top most limit for page blob write operations
    public const int MaxPageWriteCapacity = 4 * 1024 * 1024;

    // Every operation on a page blob has to manipulate a value which is rounded up to 512 bytes
    private const int PageBlobPageAdjustmentSize = 512;

    private CloudPageBlob _pageBlob;

    public override void Write(byte[] buffer, int offset, int count)
    {
        var additionalOffset = 0;
        var bytesToWriteTotal = count;

        List<Task> list = new List<Task>();
        while (bytesToWriteTotal > 0)
        {
            var bytesToWriteTotalAdjusted = RoundUpToPageBlobSize(bytesToWriteTotal);

            // Azure does not allow us to write as many bytes as we want
            // Max allowed size per write is 4MB
            var bytesToWriteNow = Math.Min((int)bytesToWriteTotalAdjusted, MaxPageWriteCapacity);
            var adjustmentBuffer = new byte[bytesToWriteNow];
            ...
            var memoryStream = new MemoryStream(adjustmentBuffer, 0, bytesToWriteNow, false, false);
            var task = _pageBlob.WritePagesAsync(memoryStream, Position, null);
            list.Add(task);
        }

        Task.WaitAll(list.ToArray());
    }

    private static long RoundUpToPageBlobSize(long size) 
    { 
        return (size + PageBlobPageAdjustmentSize - 1) & ~(PageBlobPageAdjustmentSize - 1); 
    }

I have a low performance of Write(). For example:

Stopwatch s = new Stopwatch();
s.Start();
using (var memoryStream = new MemoryStream(adjustmentBuffer, 0, bytesToWriteNow, false, false))
{
      _pageBlob.WritePages(memoryStream, Position);
}

s.Stop();
Console.WriteLine(s.Elapsed); => 00:00:01.52 == Average speed 2.4 MB/s

How can I improve my algorithm? How to use Parallel.ForEach to speedup the process?

Why just only 2.5 MB/sec, but not a 60MB/sec as in official site or http://blogs.microsoft.co.il/applisec/2012/01/05/windows-azure-benchmarks-part-2-blob-write-throughput/

like image 467
Anatoly Avatar asked Apr 19 '16 09:04

Anatoly


People also ask

Why is my Azure Blob import so slow?

When importing data from Azure Blob storage, it is slow generally. 1. Disable "Backgroud data" and "parallel loading of tables" in current files under options. 2. If you files have many columns of date/time type, you can disabled Data Load -> Auto Date/Time, so behind the scenes it doesn't create a date table automatically, which takes time.

Why is my Azure access time so slow?

The main reason you're access times are slow is because you're doing everything synchronously. The benchmarks at microsoft access the blobs in multiple threads, which will give more throughput. Now, Azure also knows that performance is an issue, which is why they've attempted to mitigate the problem by backing storage with local caching.

How does azure handle blobs in benchmarks?

The benchmarks at microsoft access the blobs in multiple threads, which will give more throughput. Now, Azure also knows that performance is an issue, which is why they've attempted to mitigate the problem by backing storage with local caching.

Why are my blob access times so slow?

The main reason you're access times are slow is because you're doing everything synchronously. The benchmarks at microsoft access the blobs in multiple threads, which will give more throughput.


1 Answers

One simple and quick thing to check: make sure your blob storage is in the same Azure region where your VM or Application is running. One issue that we ran into was our storage account was in another region from our application. This caused us a significant delay during processing. We were scratching our heads until we realized we were reading and writing across regions. Rookie mistake on our part!

like image 120
cnaegle Avatar answered Sep 20 '22 03:09

cnaegle