I have a custom stream that is used to perform write operations directly into the page cloud blob.
public sealed class WindowsAzureCloudPageBlobStream : Stream
{
// 4 MB is the top most limit for page blob write operations
public const int MaxPageWriteCapacity = 4 * 1024 * 1024;
// Every operation on a page blob has to manipulate a value which is rounded up to 512 bytes
private const int PageBlobPageAdjustmentSize = 512;
private CloudPageBlob _pageBlob;
public override void Write(byte[] buffer, int offset, int count)
{
var additionalOffset = 0;
var bytesToWriteTotal = count;
List<Task> list = new List<Task>();
while (bytesToWriteTotal > 0)
{
var bytesToWriteTotalAdjusted = RoundUpToPageBlobSize(bytesToWriteTotal);
// Azure does not allow us to write as many bytes as we want
// Max allowed size per write is 4MB
var bytesToWriteNow = Math.Min((int)bytesToWriteTotalAdjusted, MaxPageWriteCapacity);
var adjustmentBuffer = new byte[bytesToWriteNow];
...
var memoryStream = new MemoryStream(adjustmentBuffer, 0, bytesToWriteNow, false, false);
var task = _pageBlob.WritePagesAsync(memoryStream, Position, null);
list.Add(task);
}
Task.WaitAll(list.ToArray());
}
private static long RoundUpToPageBlobSize(long size)
{
return (size + PageBlobPageAdjustmentSize - 1) & ~(PageBlobPageAdjustmentSize - 1);
}
I have a low performance of Write()
. For example:
Stopwatch s = new Stopwatch();
s.Start();
using (var memoryStream = new MemoryStream(adjustmentBuffer, 0, bytesToWriteNow, false, false))
{
_pageBlob.WritePages(memoryStream, Position);
}
s.Stop();
Console.WriteLine(s.Elapsed); => 00:00:01.52 == Average speed 2.4 MB/s
How can I improve my algorithm?
How to use Parallel.ForEach
to speedup the process?
Why just only 2.5 MB/sec, but not a 60MB/sec as in official site or http://blogs.microsoft.co.il/applisec/2012/01/05/windows-azure-benchmarks-part-2-blob-write-throughput/
When importing data from Azure Blob storage, it is slow generally. 1. Disable "Backgroud data" and "parallel loading of tables" in current files under options. 2. If you files have many columns of date/time type, you can disabled Data Load -> Auto Date/Time, so behind the scenes it doesn't create a date table automatically, which takes time.
The main reason you're access times are slow is because you're doing everything synchronously. The benchmarks at microsoft access the blobs in multiple threads, which will give more throughput. Now, Azure also knows that performance is an issue, which is why they've attempted to mitigate the problem by backing storage with local caching.
The benchmarks at microsoft access the blobs in multiple threads, which will give more throughput. Now, Azure also knows that performance is an issue, which is why they've attempted to mitigate the problem by backing storage with local caching.
The main reason you're access times are slow is because you're doing everything synchronously. The benchmarks at microsoft access the blobs in multiple threads, which will give more throughput.
One simple and quick thing to check: make sure your blob storage is in the same Azure region where your VM or Application is running. One issue that we ran into was our storage account was in another region from our application. This caused us a significant delay during processing. We were scratching our heads until we realized we were reading and writing across regions. Rookie mistake on our part!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With