Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Azure Table Storage Performance from Massively Parallel Threaded Reading

Short version: Can we read from dozens or hundreds of table partitions in a multi-threaded manner to increase performance by orders of magnitude?

Long version: We're working on a system that is storing millions of rows in Azure table storage. We partition the data into small partitions, each one containing about 500 records, which represents a day worth of data for a unit.

Since Azure doesn't have a "sum" feature, to pull a year worth of data, we either have to use some pre-caching, or sum the data ourselves in an Azure web or worker role.

Assuming the following: - Reading a partition doesn't affect the performance of another - Reading a partition has a bottleneck based on network speed and server retrieval

We can then take a guess that if we wanted to quickly sum a lot of data on the fly (1 year, 365 partitions), we could use a massively parallel algorithm and it would scale almost perfectly to the number of threads. For example, we could use the .NET parallel extensions with 50+ threads and get a HUGE performance boost.

We're working on setting up some experiments, but I wanted to see if this has been done before. Since the .NET side is basically idle waiting on high-latency operations, this seems perfect for multi-threading.

like image 976
Jason Young Avatar asked Oct 07 '10 02:10

Jason Young


People also ask

How fast is Azure table storage?

With Azure Table, your throughput is limited to 20k operations per second while with Cosmos DB throughput is supported for up to 10 million operations per second.

What are the limitations of Azure tables?

References. Currently, Azure Tables only support 255 properties (columns) on a single entity (row) and a max row size of 1MB.

Is Azure table storage scalable?

In this article, I will focus on Azure Table storage, which is a relatively low-priced, scalable NoSQL storage suitable for storing large amounts of data while keeping costs low.

What is the maximum size of an entity in Azure Table Storage?

An entity in Azure Storage can be up to 1MB in size. An entity in Azure Cosmos DB can be up to 2MB in size. Properties: A property is a name-value pair. Each entity can include up to 252 properties to store data.


1 Answers

There are limits imposed on the number of transactions that can be performed against a storage account and a particular partition or storage server in a given time period (somewhere around 500 req/s). So in that sense, there is a reasonable limit to the number of requests you could execute in parallel (before it will begin to look like a DoS attack).

Also, in implementation, I would be wary of concurrent connection limits imposed on the client, such as by System.Net.ServicePointManager. I am not sure if the Azure storage client is subject to those limits; they might require adjustment.

like image 102
Michael Petito Avatar answered Nov 15 '22 08:11

Michael Petito