Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to list all blobs inside of a specific subdirectory in Azure Cloud Storage using Python?

I worked through the example code from the Azure docs https://docs.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-python

from azure.storage.blob import BlockBlobService
account_name = "x"
account_key = "x"
top_level_container_name = "top_container"

blob_service = BlockBlobService(account_name, account_key)

print("\nList blobs in the container")
generator = blob_service.list_blobs(top_level_container_name)
for blob in generator:
    print("\t Blob name: " + blob.name)

Now I would like to know how to get more fine grained in my container walking. My container top_level_container_name has several subdirectories

  • top_level_container_name/dir1
  • top_level_container_name/dir2
  • etc in that pattern

I would like to be able to list all of the blobs that are inside just one of those directories. For instance

  • dir1/a.jpg
  • dir1/b.jpg
  • etc

How do I get a generator of just the contents of dir1 without having to walk all of the other dirs? (I would also take a list or dictionary)

I tried adding /dir1 to the name of the top_level_container_name so it would be top_level_container_name = "top_container/dir1" but that didn't work. I get back an error code azure.common.AzureHttpError: The requested URI does not represent any resource on the server. ErrorCode: InvalidUri

The docs do not seem to even have any info on BlockBlobService.list_blobs() https://docs.microsoft.com/en-us/python/api/azure.storage.blob.blockblobservice.blockblobservice?view=azure-python

Update: list_blobs() comes from https://github.com/Azure/azure-storage-python/blob/ff51954d1b9d11cd7ecd19143c1c0652ef1239cb/azure-storage-blob/azure/storage/blob/baseblobservice.py#L1202

like image 714
aaron Avatar asked Jul 03 '18 00:07

aaron


People also ask

How do you list all blobs in a container in Python?

If you wish to get all the blob names in all the containers in a storage account, just do blob_service. list_containers to iterate through each container and list all blobs under each iteration. This is also a useful article on how to use Azure Blob Storage from Python.

How many blobs are in azure storage?

Azure Storage supports three types of blobs: Block blobs store text and binary data. Block blobs are made up of blocks of data that can be managed individually. Block blobs can store up to about 190.7 TiB.


2 Answers

Not able to import BlockBlobService. Seems like BlobServiceClient is the new alternative. Followed the official doc and found this:

from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient

Create a Blob Storage Account client

connect_str = <connectionstring>
blob_service_client = BlobServiceClient.from_connection_string(connect_str)

Create a container client

container_name="dummy"
container_client=blob_service_client.get_container_client(container_name)

This will list all blobs in the container inside dir1 folder/directory

blob_list = container_client.list_blobs(name_starts_with="dir1/")
for blob in blob_list:
print("\t" + blob.name)
like image 152
Prashant Babber Avatar answered Sep 20 '22 08:09

Prashant Babber


The module azurebatchload provides for this and more. You can filter on folder or filenames, plus choose to get the the result in various formats:

  • list
  • dictionary with extended info
  • pandas dataframe

1. List a whole container with just the filenames as a list.

from azurebatchload import Utils

list_blobs = Utils(container='containername').list_blobs()

2. List a whole container with just the filenames as a dataframe.

from azurebatchload import Utils

df_blobs = Utils(
   container='containername',
   dataframe=True
).list_blobs()

3. List a folder in a container.

from azurebatchload import Utils

list_blobs = Utils(
   container='containername',
   name_starts_with="foldername/"
).list_blobs()

4. Get extended information a folder.

from azurebatchload import Utils

dict_blobs = Utils(
   container='containername',
   name_starts_with="foldername/",
   extended_info=True
).list_blobs()

5. Get extended information a folder returned as a pandas dataframe.

from azurebatchload import Utils

df_blobs = Utils(
   container='containername',
   name_starts_with="foldername/",
   extended_info=True,
   dataframe=True
).list_blobs()

disclaimer: I am the author of the azurebatchload module.

like image 33
Erfan Avatar answered Sep 22 '22 08:09

Erfan