I have a Windows Azure application in which all read queries of TableA are executed on single partitions for a range of rowkeys. The Partition Keys that facilitate this storage scheme are actually flattened names of objects in a hierarchy, such that the Partition Key is formatted like <code>{root}_{child1}_{child2}_{leaf}</code>. I can understand how it might be beneficial to divide this one big TableA into many tables by using the root dimension of the Partition Keys in the naming of the Tables (so the Partition Key would become <code>{child1}_{child2}_{leaf}</code>). What I want to do is provide as rapid access to this data as I can from as many connections at the same time as possible. It would also be incredible if I could figure out what these limits are or should be. More specific questions about my proposed change: <ol> <li>Will this make a difference in scalability, i.e. the number of simultaneous data access requests that can be served without perfecting performance dramatically? Served at the same time at all?</li> <li>Will this make a difference in average performance? Potential performance?</li> </ol>

+1 for Steve's answer. Some things to add <ul> <li>it might be worth considering using multiple storage accounts - since it's currently the storage account that is the unit of scability - each storage account is officially targeted to about 5000 entity/transactions per second so if you want higher than that then you need to use multiple accounts.</li> <li>there are some delicate details in performance about how you query your data - if items are not in the same partition then its generally quicker to perform separate parallel queries instead of performing a single query with a complicated where parameter. </li> <li>you may find the blog posts on the storage team blog particularly helpful - http://blogs.msdn.com/b/windowsazurestorage/archive/2010/11/06/how-to-get-most-out-of-windows-azure-tables.aspx and http://blogs.msdn.com/b/windowsazurestorage/archive/2010/05/10/windows-azure-storage-abstractions-and-their-scalability-targets.aspx </li> <li>you may also need to be aware of the costs - roughly $1 per million hits. </li> </ul>

How does one Azure table storage table with many partition keys compare to many tables with fewer partition keys?

Tags:

scalability

azure

partitioning

azure-table-storage

I have a Windows Azure application in which all read queries of TableA are executed on single partitions for a range of rowkeys. The Partition Keys that facilitate this storage scheme are actually flattened names of objects in a hierarchy, such that the Partition Key is formatted like {root}_{child1}_{child2}_{leaf}. I can understand how it might be beneficial to divide this one big TableA into many tables by using the root dimension of the Partition Keys in the naming of the Tables (so the Partition Key would become {child1}_{child2}_{leaf}).

What I want to do is provide as rapid access to this data as I can from as many connections at the same time as possible. It would also be incredible if I could figure out what these limits are or should be.

More specific questions about my proposed change:

Will this make a difference in scalability, i.e. the number of simultaneous data access requests that can be served without perfecting performance dramatically? Served at the same time at all?
Will this make a difference in average performance? Potential performance?

603

asked Jun 12 '11 04:06

user483679

2 Answers

If every query specifies a partition key, it makes no difference how many tables those partitions are spread across. In other words, the following are equivalent: one table with a thousand partitions versus a thousand tables each with one partition.

The main reason I can think of to consider splitting out into multiple tables is that you can delete an entire table in a single operation/transaction, while you can't to that with a range of partitions within the same table. That means for things like logs, where you may want to delete the older ones after a while, it's often better to have different tables for different time ranges.

answered Sep 30 '22 09:09

user94559

+1 for Steve's answer.

Some things to add

it might be worth considering using multiple storage accounts - since it's currently the storage account that is the unit of scability - each storage account is officially targeted to about 5000 entity/transactions per second so if you want higher than that then you need to use multiple accounts.
there are some delicate details in performance about how you query your data - if items are not in the same partition then its generally quicker to perform separate parallel queries instead of performing a single query with a complicated where parameter.
you may find the blog posts on the storage team blog particularly helpful - http://blogs.msdn.com/b/windowsazurestorage/archive/2010/11/06/how-to-get-most-out-of-windows-azure-tables.aspx and http://blogs.msdn.com/b/windowsazurestorage/archive/2010/05/10/windows-azure-storage-abstractions-and-their-scalability-targets.aspx
you may also need to be aware of the costs - roughly $1 per million hits.

answered Sep 30 '22 07:09

Stuart

Related questions
                            
                                Azure - 2x extra small or a single small instance
                            
                                Looking for a .NET BuildServer SaaS
                            
                                Azure website cannot access Azure DB
                            
                                How do you wait on a Task Scheduler task to finish in a batch file or C#?
                            
                                Test Webhook at localhost in braintree
                            
                                How does Azure DocumentDB scale? And do I need to worry about it?
                            
                                Identity Column in DocumentDB
                            
                                Create hive external table from partitioned parquet files in Azure HDInsights
                            
                                How can I do ModelBinding with HttpTrigger in Azure Functions?
                            
                                Debugging x64 Azure Functions in Visual Studio
                            
                                Azure web app is 503 Service Unavailable. How do I get it back running?
                            
                                How do I retrieve the service principal password after creation using the azure cli?
                            
                                Can't build .Net 5 in pipeline
                            
                                Scheduled Azure WebJob but NoAutomaticTrigger-Method not invoked
                            
                                Why is a Microsoft.Web/serverfarms resource required for hosting a website?
                            
                                What does an HTTP 500 error with a 121 sub-status signify?
                            
                                Azure SDK v2.7 diagnostics issue is preventing publish/package
                            
                                The listener for function was unable to start. Why?
                            
                                Mocking CloudStorageAccount and CloudTable for Azure table storage
                            
                                How to generate client secret in azure app registration in Azure AD from CLI?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With