Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Windows Azure Cloud Storage - Impact of huge number of files in root

Sorry if I get any of the terminology wrong here, but hopefully you will get what I mean.

I am using Windows Azure Cloud Storage to store a vast quantity of small files (images, 20Kb each).

At the minute, these files are all stored in the root directory. I understand it's not a normal file system, so maybe root isn't the correct term.

I've tried to find information on the long-term effects of this plan but with no luck so if any one can give me some information I'd be grateful.

Basically, am I going to run into problems if the numbers of files stored in this root end up in the hundreds of thousands/millions?

Thanks,

Steven

like image 374
Steven Elliott Avatar asked Jul 30 '10 12:07

Steven Elliott


People also ask

What is the maximum data size that can be stored using Microsoft Azure cloud storage?

The maximum size of a block blob is 200 GB. But by utilizing striping, the maximum size of an individual backup can be up to 12 TB.

Which Blob access storage tier would you use when dealing with a large amount of data that is actively used?

The hot tier has the highest storage costs, but the lowest access costs. Cool tier - An online tier optimized for storing data that is infrequently accessed or modified. Data in the cool tier should be stored for a minimum of 30 days.

What is Azure storage limit?

250. Maximum number of storage accounts with Azure DNS zone endpoints (preview) per region per subscription, including standard and premium storage accounts. 5000 (preview) Default maximum storage account capacity.

What is the total storage capacity of an Azure storage account in terabytes?

Microsoft Azure Storage Accounts Each storage account can be configured with a specific set of settings that applies to all data within the storage account. Each Azure subscription can have up to 200 storage accounts, each with up to 500 TiB (roughly 550 TB) of space.


2 Answers

I've been in a similar situation where we were storing ~10M small files in one blob container. Accessing individual files through code was fine and there weren't any performance problems.

Where we did have problems was with managing that many files outside of code. If you're using a storage explorer (either the one that comes with VS2010 or anyone of the others), the ones I've encountered don't support the return files by prefix API, you can only list the first 5K, then the next 5K and so on. You can see how this might be a problem when you want to look at the 125,000th file in the container.

The other problem is that there is no easy way of finding out how many files are in your container (which can be important for knowing exactly how much all of that blob storage is costing you) without writing something that simply iterates over all the blobs and counts them.

This was an easy problem to solve for us as our blobs had sequential numeric names, so we've simply partitioned them into folders of 1k items each. Depending on how many items you've got you can group 1K of these folders into sub folders.

like image 57
knightpfhor Avatar answered Oct 12 '22 17:10

knightpfhor


http://social.msdn.microsoft.com/Forums/en-US/windowsazure/thread/d569a5bb-c4d4-4495-9e77-00bd100beaef

Short Answer: No

Medium Answer: Kindof?

Long Answer: No, but if you query for a file list it will only return 5000. You'll need to requery every 5k to get a full listing according to that MSDN page.

Edit: Root works fine for describing it. 99.99% of people will grok what you're trying to say.

like image 21
Caladain Avatar answered Oct 12 '22 17:10

Caladain