What is the most efficient way to get the count on the number of blobs in an Azure Storage container?
Right now I can't think of any way other than the code below:
CloudBlobContainer container = GetContainer("mycontainer");
var count = container.ListBlobs().Count();
Run the SQL query Name your SQL query in the properties pane on the right. Publish your SQL query by pressing CTRL+S or selecting the Publish all button. Select the Run button to execute the SQL query. The blob count and total size per container are reported in the Results pane.
This module simply counts the number of non-black pixel groups (blobs) within the current image. This value is then set into a variable called BLOB_COUNT.
If you just want to know how many blobs are in a container without writing code you can use the Microsoft Azure Storage Explorer application.
I tried counting blobs using ListBlobs() and for a container with about 400,000 items, it took me well over 5 minutes.
If you have complete control over the container (that is, you control when writes occur), you could cache the size information in the container metadata and update it every time an item gets removed or inserted. Here is a piece of code that would return the container blob count:
static int CountBlobs(string storageAccount, string containerId)
{
CloudStorageAccount cloudStorageAccount = CloudStorageAccount.Parse(storageAccount);
CloudBlobClient blobClient = cloudStorageAccount.CreateCloudBlobClient();
CloudBlobContainer cloudBlobContainer = blobClient.GetContainerReference(containerId);
cloudBlobContainer.FetchAttributes();
string count = cloudBlobContainer.Metadata["ItemCount"];
string countUpdateTime = cloudBlobContainer.Metadata["CountUpdateTime"];
bool recountNeeded = false;
if (String.IsNullOrEmpty(count) || String.IsNullOrEmpty(countUpdateTime))
{
recountNeeded = true;
}
else
{
DateTime dateTime = new DateTime(long.Parse(countUpdateTime));
// Are we close to the last modified time?
if (Math.Abs(dateTime.Subtract(cloudBlobContainer.Properties.LastModifiedUtc).TotalSeconds) > 5) {
recountNeeded = true;
}
}
int blobCount;
if (recountNeeded)
{
blobCount = 0;
BlobRequestOptions options = new BlobRequestOptions();
options.BlobListingDetails = BlobListingDetails.Metadata;
foreach (IListBlobItem item in cloudBlobContainer.ListBlobs(options))
{
blobCount++;
}
cloudBlobContainer.Metadata.Set("ItemCount", blobCount.ToString());
cloudBlobContainer.Metadata.Set("CountUpdateTime", DateTime.Now.Ticks.ToString());
cloudBlobContainer.SetMetadata();
}
else
{
blobCount = int.Parse(count);
}
return blobCount;
}
This, of course, assumes that you update ItemCount/CountUpdateTime every time the container is modified. CountUpdateTime is a heuristic safeguard (if the container did get modified without someone updating CountUpdateTime, this will force a re-count) but it's not reliable.
The API doesn't contain a container count method or property, so you'd need to do something like what you posted. However, you'll need to deal with NextMarker if you exceed 5,000 items returned (or if you specify max # to return and the list exceeds that number). Then you'll make add'l calls based on NextMarker and add the counts.
EDIT: Per smarx: the SDK should take care of NextMarker for you. You'll need to deal with NextMarker if you're working at the API level, calling List Blobs through REST.
Alternatively, if you're controlling the blob insertions/deletions (through a wcf service, for example), you can use the blob container's metadata area to store a cached container count that you compute with each insert or delete. You'll just need to deal with write concurrency to the container.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With