Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Limitations on Windows Azure Table Storage accounts

I am designing a multi-tennant web-based SaaS application that will be hosted on Windows Azure and use Table Storage.

The only limits I have found so far are:

  • 5 storage accounts per subscription
  • 100 TB maximum per storage account
  • 1 MB per entity

I am deciding how to best partition my storage for multiple customers:

Option 1: Give each customer their own storage account. Not likely, considering the 5 account default limit.

Option 2: Give each customer their own set of tables. Prefix the table names with customer identifiers, such as a Books table split as "CustA_Books", "CustB_Books", etc.

Option 3: Have one set of tables, but prefix the partition keys to split the customers. So one "Books" table with partition keys of "CustA_Fiction", "CustA_NonFiction", "CustB_Fiction", "CustB_NonFiction", etc.

What are the pros and cons for options 2 and 3? Is there a limit to the number of tables in a single account that might affect option 2?

like image 689
Matt Johnson-Pint Avatar asked Apr 27 '11 19:04

Matt Johnson-Pint


People also ask

What are the limitations of Azure tables?

References. Currently, Azure Tables only support 255 properties (columns) on a single entity (row) and a max row size of 1MB.

How much data can be stored in a single Table storage account in Azure?

An entity in Azure Storage can be up to 1MB in size. An entity in Azure Cosmos DB can be up to 2MB in size. Properties: A property is a name-value pair. Each entity can include up to 252 properties to store data.

Is Azure Table storage deprecated?

Azure table storage is being deprecated in favor of Azure Cosmos DB. Azure Cosmos DB offers many advantages over Azure table storage, such as: -Azure Cosmos DB is scalable.


3 Answers

There are no limits to the number of tables you can create in Windows Azure. Your only limits ar the ones you have already listed. Well... I guess there are other limits if you consider the size of the entity attribute is always 64KB or less or if you consider batch options (100 entities or 4MB, whatever is the lesser).

Anyhow, the thing to keep in mind here is that your PartitionKey is going to be the most important thing you make. If you create a PK with the customer name in it, you get some good partitioning benefits. The downside to this is that if you mix the customer data in the same table, you make it harder on yourself to delete data (if you ever need to delete a customer). So, you can use the table as another level of partitioning. The PK you create is scoped to the table you create it under.

What I would consider here is if you ever need to delete the data in bulk or if you ever need to query data across customers (tenants). For the first one, it makes a ton of sense to use separate tables per customer so a delete is one operation versus at best 1 per 100 entities. However, if you need to query across tenants it is harder to join this data when you have multiple tables (that would require multiple queries).

All things being equal, I would use the tables as another level of partitioning if there is no overlap in tenant functionality and make my life easier should I want to delete a tenant. So, I guess that is option 2.

HTH

like image 150
dunnry Avatar answered Sep 22 '22 09:09

dunnry


I highly suggest Option 2

We are also going this route because it adds a nice level or federation for the customer data. As the answered comment mentions it is easier to manage adding/deleting customers. Another benefit that we have noticed is the 'copy-abilty' of a customers data. This approach makes it much easier to move customer specific data to other storage accounts or to development environments for testing without affecting the entire lot.

In the SaaS world it also enables customers to get a copy of their own data with little effort, which is also a concern of many SaaS users.

like image 24
Rentering.com Avatar answered Sep 20 '22 09:09

Rentering.com


Another alternative: Imagine you have N storage accounts, the limit is 100 storage accounts per subscription. Each storage account have a table per customer.

  1. For table request operations with Partition Key, like Insert, Update, Delete or a point query, you calculate hash value of customer name + partition key, calculate its modular of base N (total number of storage accounts), find the index of the exact storage account and forward the request to the correct storage account / table.

  2. For read requests with no partition key, like a range query. Then you would need to broadcast the request to all storage accounts and merge the results.

One of the other things to keep in mind specifically around naming multiple storage accounts. Avoid naming the accounts lexicographically, that will cause them to be served from the same partition server on Azure backend and against their recommended scalability best practises. If you have N storage accounts. prefix each storage account name with a 3 digit hash, so they would be evenly distributed.

like image 21
Dogu Arslan Avatar answered Sep 18 '22 09:09

Dogu Arslan