We're creating a multi-tenant application that must segregate data between tenants. Each tenant will save various documents, each of which can fall into several different document categories. We plan to use Azure blob storage for these documents. However, given our user base and the number of documents and size of each one, we're not sure how to best manage storage accounts with our current Azure subscription.
Here are some numbers to consider. With 5,000 users at 27,000 8Mb documents per year per user, that is 1080TB per year total. A storage container maxes out at 500TB per storage account.
So my question is what would be the most efficient and cost effective way to store this data and stay within the Azure limits?
Here are a few things we've considered:
Create a storage account for each client. THIS DOES NOT WORK because you can only have 100 storage accounts per subscription (this would have been the most ideal solution).
Create a blob container for each client. A storage account can have up to 500TB, so this could potentially work except for eventually we would have to split off into other storage accounts. I'm not sure how that would work if eventually a user had data in two accounts. Could get messy.
Perhaps we are missing something fundamentally simple here.
UPDATE For now our thought is to use Azure table storage with a table for each document type. Within each table the partition key would be the tenant's ID, and the row key would be the document ID. Each row would also contain metadata type information for the document, along with a URI (or something) linking to the blob itself.
Not really an answer but think of it as "food for thought" :). Basically your architecture should be based on the fact that each storage account has some scalability targets
and your design should be such that you don't exceed those to maintain high availability of storage for your application.
Some recommendations:
Pods
.PartitionKey
became free and you can assign some other value if needed.Now coming on to storing files:
Pod
concept wherein files for each tenant reside in the pod
storage account for that tenant.pod
storage account and put the file there and store the blob URL in the Files
table.tenant-files
) or separate blob containers for each tenant.pod
is commissioned. However the downside is that you can't logically separate files by tenant so if you want to provide direct access to the files (using Shared Access Signature), it would be problematic.Hope this gives you some idea about how you can go about architecting your solution. We're using some of these concepts in our solution (which explicitly uses Azure Storage as data store). It would be really interesting to see what architecture you come up with.
I am just going to put my thought on the topic, and it do have some redundant information to Gaurav Mantri's answer. This is based on a design that I came up with after doing something very similar at my current work.
Randomly select a pod
from pod pool
when tenant is created and store its namespace along with the tenant information.
Provide an api for creating containers where container names are composite of tenant id Guid::ToString("N") + <resourcename>
. You dont need to sell the to your users as containers, i can be folders, worksets or filebox, you find a name.
Provide an api for maintaining documents within these containers.
This means that you can just increase the pod pool
if getting more tenants, ect remove those pods
that is getting filled up.
The benefits of this is that you do not need to keep two systems for your data, using both table storage and blob storage. Blob storage already have a way to present data as a directory/files hierarchy.
On top of the above design I made an Owin Middleware that wraps in between clients and blob storage, basicly just forwarding requests from clients to blob storage.
This step is off cause not needed, as you can delegate normal sas tokens and talk directly to blob storage from clients. But it makes it easy to hook into when actions are executed on files. Each tenant will get its own endpoint files/teantid/<resourcename>/
Using such an API would also enable you to hook into whatever token authentication system you may be useing already to validate the authenticate and authorize the incoming requests and then sign the requests in this API.
Using the above api broker extension, combined with metadata one can actually take it a step further and modify incoming requests to always include metadata and add in filters on the xml returned to blob storage before sending it to clients to filter out containers or blobs. One example would be when users delete a blob, then set a x-ms-meta-status:deleted
and filter them out when returning blobs/containers. This way you can add different procedures for deleting data behind the scenes.
One should be careful here, since you don't want to put to much logic in here since it adds a penalty on all requests, but doing it smart can make this work very nice.
This extensions would also allow you to allow your users to create "empty" subfolders inside a container, but placing a zero byte file with a status:hidden that also will be filtered out. (remember that blob storage only can show virtual folders if there is something in them). This could also be achieved using table storage.
Another great extension point is that for each blob you could keep it in Azure Search to be able to find content, and this is most likely my favorite. I dont see any good solution using just blob storage or table storage that could give you a good search functionality or to some extend even a good filtering experience. With Azure Search this would give users a really rich experience for finding their content again.
Another extension is that snapshots could be created for every time a file is modified automatically. This becomes even easier with the broker api, otherwise monitoring logs is an options.
These ideas comes from a project that I started that I wanted to share, but since I am busy the coming months at work I don't see myself releasing my project before the summer holidays give me time to finish. The motivation of the project is to provide a nuget package that enables other developers to quickly set up this broker api that i mentioned above and configure a multi tenant blob storage solution.
I kindly ask you to vote up this answer if you read this and believe such a project could have saved you time in your current development process. This way I can see if I could use more time on the project or not.
I think that gaurav Mantris answer is more spot on for the question above, but just wanted to share my ideas on the topic.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With