Azure Table Storage partition individual performance

1 Answers

As a general rule, you want to avoid table scans whenever possible. They are very expensive operations (esp. if you have a lot of partitions). not so much from a table stress standpoint, but they have very high aggregate latency (explained below). That said, sometimes there is simply no avoiding it.

We have updated the storage architecture and raised a bunch of the target limits.

http://blogs.msdn.com/b/windowsazure/archive/2012/11/02/windows-azure-s-flat-network-storage-and-2012-scalability-targets.aspx

Each storage account is now 20k IOPS/sec. Each partition is now 2k/sec

How partitions interact is a little subtle and depend on how they are being used (and change over time).

Azure storage has two stages - one set of servers handle ranges, the other set the actual storage (i.e. the 3 copies). When a table is cold, all of the partitions may be serviced by one server. As partitions are put under sustained stress, the system will begin to automatically spread the workload (i.e. shard) to additional servers. The shards are made on partition boundaries.

For low/medium stress, you may not hit the threshold to ever shard or only a minimal number of times. Also the access pattern will have some impact (if you are appending only, sharding won't help). Random access across all patterns will scale by far the best. When the system is rebalancing, you will get a 503 response for a few seconds and then operations will return to normal.

If you do a table scan, you will actually make multiple round trips to the table. When a query reaches the end of a partition the response will be returned with any data found (or no data if the criteria was not met) and a continuation token. The query is then resubmitted (and returned w/token) again and again until you get to the bottom of the table. This is abstracted by the SDK, but if you made direct REST calls you would see it.

From a table performance perspective, the scan would only impact the partition in which it is currently scanning.

To speed up a broad query that hits multiple partitions you could actually break it up to multiple parallel access (e.g. one thread per partition) and then coalesce in the client. Really it depends on how much data you are getting back, how big the table is, etc.

answered Nov 16 '22 02:11

Pat Filoteo

Related questions
                            
                                Performance in Python 3 dictionary iteration: dict[key] vs. dict.items()
                            
                                Recommendations for Web application performance benchmarks
                            
                                How expensive is MD5 generation in .NET?
                            
                                Slow MySQL inserts
                            
                                Which is the fastest JavaScript engine, and does it really matter? [closed]
                            
                                Performant File Copy in C#?
                            
                                .NET C# switch statement string compare versus enum compare
                            
                                What makes STL fast? [closed]
                            
                                How to find out what SQL queries are being blocked and what's blocking them?
                            
                                Regex vs. Manual comparison. Which is faster?
                            
                                Java Loops Optimization
                            
                                what are the consequences of having unused functions
                            
                                Does const'ing primitive types in function parameters result in a significant performance boost?
                            
                                How-to's for MySQL InnoDB (insert) performance optimization?
                            
                                Fast log2(float x) implementation C++
                            
                                Pythonic and efficient way of defining multiple regexes for use over many iterations
                            
                                Concurrent requests in Appengine Python
                            
                                Pattern Match in F# much slower than If else / switch in C#? [duplicate]
                            
                                Matlab matrix multiplication speed
                            
                                Is Object-Oriented Programming in Interpreted languages (i.e, PHP) efficient? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Azure Table Storage partition individual performance

Tags:

performance

azure

azure-storage

azure-table-storage

Dave New

People also ask

1 Answers

Pat Filoteo

Recent Activity

Donate For Us