I worked through the example code from the Azure docs https://docs.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-python
from azure.storage.blob import BlockBlobService
account_name = "x"
account_key = "x"
top_level_container_name = "top_container"
blob_service = BlockBlobService(account_name, account_key)
print("\nList blobs in the container")
generator = blob_service.list_blobs(top_level_container_name)
for blob in generator:
print("\t Blob name: " + blob.name)
Now I would like to know how to get more fine grained in my container walking. My container top_level_container_name has several subdirectories
I would like to be able to list all of the blobs that are inside just one of those directories. For instance
How do I get a generator of just the contents of dir1 without having to walk all of the other dirs? (I would also take a list or dictionary)
I tried adding /dir1 to the name of the top_level_container_name so it would be top_level_container_name = "top_container/dir1"
but that didn't work. I get back an error code azure.common.AzureHttpError: The requested URI does not represent any resource on the server. ErrorCode: InvalidUri
The docs do not seem to even have any info on BlockBlobService.list_blobs() https://docs.microsoft.com/en-us/python/api/azure.storage.blob.blockblobservice.blockblobservice?view=azure-python
Update: list_blobs() comes from https://github.com/Azure/azure-storage-python/blob/ff51954d1b9d11cd7ecd19143c1c0652ef1239cb/azure-storage-blob/azure/storage/blob/baseblobservice.py#L1202
If you wish to get all the blob names in all the containers in a storage account, just do blob_service. list_containers to iterate through each container and list all blobs under each iteration. This is also a useful article on how to use Azure Blob Storage from Python.
Azure Storage supports three types of blobs: Block blobs store text and binary data. Block blobs are made up of blocks of data that can be managed individually. Block blobs can store up to about 190.7 TiB.
Not able to import BlockBlobService. Seems like BlobServiceClient is the new alternative. Followed the official doc and found this:
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient
Create a Blob Storage Account client
connect_str = <connectionstring>
blob_service_client = BlobServiceClient.from_connection_string(connect_str)
Create a container client
container_name="dummy"
container_client=blob_service_client.get_container_client(container_name)
This will list all blobs in the container inside dir1 folder/directory
blob_list = container_client.list_blobs(name_starts_with="dir1/")
for blob in blob_list:
print("\t" + blob.name)
The module azurebatchload
provides for this and more. You can filter on folder or filenames, plus choose to get the the result in various formats:
from azurebatchload import Utils
list_blobs = Utils(container='containername').list_blobs()
from azurebatchload import Utils
df_blobs = Utils(
container='containername',
dataframe=True
).list_blobs()
from azurebatchload import Utils
list_blobs = Utils(
container='containername',
name_starts_with="foldername/"
).list_blobs()
from azurebatchload import Utils
dict_blobs = Utils(
container='containername',
name_starts_with="foldername/",
extended_info=True
).list_blobs()
from azurebatchload import Utils
df_blobs = Utils(
container='containername',
name_starts_with="foldername/",
extended_info=True,
dataframe=True
).list_blobs()
disclaimer: I am the author of the azurebatchload module.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With