Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting blob count in an Azure Storage container

What is the most efficient way to get the count on the number of blobs in an Azure Storage container?

Right now I can't think of any way other than the code below:

CloudBlobContainer container = GetContainer("mycontainer");
var count = container.ListBlobs().Count();
like image 376
brennazoon Avatar asked Jul 28 '11 15:07

brennazoon


People also ask

How do I count files in BLOB storage?

Run the SQL query Name your SQL query in the properties pane on the right. Publish your SQL query by pressing CTRL+S or selecting the Publish all button. Select the Run button to execute the SQL query. The blob count and total size per container are reported in the Results pane.

What is blob count?

This module simply counts the number of non-black pixel groups (blobs) within the current image. This value is then set into a variable called BLOB_COUNT.


3 Answers

If you just want to know how many blobs are in a container without writing code you can use the Microsoft Azure Storage Explorer application.

  1. Open the desired BlobContainer enter image description here
  2. Click the Folder Statistics icon enter image description here
  3. Observe the count of blobs in the Activities window enter image description here
like image 114
Matt Avatar answered Oct 07 '22 12:10

Matt


I tried counting blobs using ListBlobs() and for a container with about 400,000 items, it took me well over 5 minutes.

If you have complete control over the container (that is, you control when writes occur), you could cache the size information in the container metadata and update it every time an item gets removed or inserted. Here is a piece of code that would return the container blob count:

static int CountBlobs(string storageAccount, string containerId)
{
    CloudStorageAccount cloudStorageAccount = CloudStorageAccount.Parse(storageAccount);
    CloudBlobClient blobClient = cloudStorageAccount.CreateCloudBlobClient();
    CloudBlobContainer cloudBlobContainer = blobClient.GetContainerReference(containerId);

    cloudBlobContainer.FetchAttributes();

    string count = cloudBlobContainer.Metadata["ItemCount"];
    string countUpdateTime = cloudBlobContainer.Metadata["CountUpdateTime"];

    bool recountNeeded = false;

    if (String.IsNullOrEmpty(count) || String.IsNullOrEmpty(countUpdateTime))
    {
        recountNeeded = true;
    }
    else
    {
        DateTime dateTime = new DateTime(long.Parse(countUpdateTime));

        // Are we close to the last modified time?
        if (Math.Abs(dateTime.Subtract(cloudBlobContainer.Properties.LastModifiedUtc).TotalSeconds) > 5) {
            recountNeeded = true;
        }
    }

    int blobCount;
    if (recountNeeded)
    {
        blobCount = 0;
        BlobRequestOptions options = new BlobRequestOptions();
        options.BlobListingDetails = BlobListingDetails.Metadata;

        foreach (IListBlobItem item in cloudBlobContainer.ListBlobs(options))
        {
            blobCount++;
        }

        cloudBlobContainer.Metadata.Set("ItemCount", blobCount.ToString());
        cloudBlobContainer.Metadata.Set("CountUpdateTime", DateTime.Now.Ticks.ToString());
        cloudBlobContainer.SetMetadata();
    }
    else
    {
        blobCount = int.Parse(count);
    }

    return blobCount;
}

This, of course, assumes that you update ItemCount/CountUpdateTime every time the container is modified. CountUpdateTime is a heuristic safeguard (if the container did get modified without someone updating CountUpdateTime, this will force a re-count) but it's not reliable.

like image 15
David Airapetyan Avatar answered Oct 07 '22 14:10

David Airapetyan


The API doesn't contain a container count method or property, so you'd need to do something like what you posted. However, you'll need to deal with NextMarker if you exceed 5,000 items returned (or if you specify max # to return and the list exceeds that number). Then you'll make add'l calls based on NextMarker and add the counts.

EDIT: Per smarx: the SDK should take care of NextMarker for you. You'll need to deal with NextMarker if you're working at the API level, calling List Blobs through REST.

Alternatively, if you're controlling the blob insertions/deletions (through a wcf service, for example), you can use the blob container's metadata area to store a cached container count that you compute with each insert or delete. You'll just need to deal with write concurrency to the container.

like image 11
David Makogon Avatar answered Oct 07 '22 13:10

David Makogon