In the documentation it states that Azure Table Storage partitions have a minimum speed of 500 operations/second.
If my data is partitioned correctly, would parallel operations on each of these partitions have no affect on each other?
For example, if I had to do expensive full table scans on partition A (maxing at 500 entities/second), would the performance of any operation occuring on partition B be affected?
Storage accounts have a limit of 5000 operations/second. Does this essentially mean that I can max out 10 partitions before they start to affect each others performance?
In contrast, Cosmos DB limits read/write latency to under 10 milliseconds. With Azure Table, your throughput is limited to 20k operations per second while with Cosmos DB throughput is supported for up to 10 million operations per second.
Azure Storage Table was not created by God, it was created by folks like us (much smarter though). It has limitations on size of a single row (1 MB), size of a single column (64KB), number of columns per row (255) and so on.
Azure table storage is deprecated, but still in use by some organizations. Many organizations are still using Azure table storage because it is easy to use, and it has a lot of features. However, as Azure table storage becomes more difficult to use, organizations may want to look into alternatives, such as Cosmos DB.
How should you choose a good partition key for a Table storage implementation? (Choose all that apply.) They should always be unique, like a primary key in a SQL table. You should always use the same partition key for all records. Think about how you're likely to update the data using batch transactions.
As a general rule, you want to avoid table scans whenever possible. They are very expensive operations (esp. if you have a lot of partitions). not so much from a table stress standpoint, but they have very high aggregate latency (explained below). That said, sometimes there is simply no avoiding it.
We have updated the storage architecture and raised a bunch of the target limits.
http://blogs.msdn.com/b/windowsazure/archive/2012/11/02/windows-azure-s-flat-network-storage-and-2012-scalability-targets.aspx
Each storage account is now 20k IOPS/sec. Each partition is now 2k/sec
How partitions interact is a little subtle and depend on how they are being used (and change over time).
Azure storage has two stages - one set of servers handle ranges, the other set the actual storage (i.e. the 3 copies). When a table is cold, all of the partitions may be serviced by one server. As partitions are put under sustained stress, the system will begin to automatically spread the workload (i.e. shard) to additional servers. The shards are made on partition boundaries.
For low/medium stress, you may not hit the threshold to ever shard or only a minimal number of times. Also the access pattern will have some impact (if you are appending only, sharding won't help). Random access across all patterns will scale by far the best. When the system is rebalancing, you will get a 503 response for a few seconds and then operations will return to normal.
If you do a table scan, you will actually make multiple round trips to the table. When a query reaches the end of a partition the response will be returned with any data found (or no data if the criteria was not met) and a continuation token. The query is then resubmitted (and returned w/token) again and again until you get to the bottom of the table. This is abstracted by the SDK, but if you made direct REST calls you would see it.
From a table performance perspective, the scan would only impact the partition in which it is currently scanning.
To speed up a broad query that hits multiple partitions you could actually break it up to multiple parallel access (e.g. one thread per partition) and then coalesce in the client. Really it depends on how much data you are getting back, how big the table is, etc.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With