Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Upload multiple files from folder to Azure Blob storage using Azure Storage SDK for Python

I have some images in a local folder on my windows machine. I want to upload all images to the same blob in the same container.

I know how to upload a single file with Azure Storage SDKs BlockBlobService.create_blob_from_path(), but I do not see a possibility to upload all images in the folder at once.

However, the Azure Storage Explorer provides a functionality for this, so it must be possible somehow.

Is there a function providing this service or do I have to loop over all files in a folder and run create_blob_from_path() multiple times for the same blob?

like image 885
gehbiszumeis Avatar asked Mar 06 '23 02:03

gehbiszumeis


2 Answers

There is no direct way to do this. You can go through the azure storage python SDK blockblobservice.py and baseblobservice.py for details.

As you mentioned, you should loop over it. The sample code as below:

from azure.storage.blob import BlockBlobService, PublicAccess
import os

def run_sample():
    block_blob_service = BlockBlobService(account_name='your_account', account_key='your_key')
    container_name ='t1s'

    local_path = "D:\\Test\\test"

    for files in os.listdir(local_path):
        block_blob_service.create_blob_from_path(container_name,files,os.path.join(local_path,files))


# Main method.
if __name__ == '__main__':
    run_sample()

The files in local: enter image description here

After code execution, they are uploaded to azure: enter image description here

like image 117
Ivan Yang Avatar answered Mar 07 '23 15:03

Ivan Yang


You could potentially achieve better upload performance by exploring multithreading. Here is some code to do this:

from azure.storage.blob import BlobClient
from threading import Thread
import os


# Uploads a single blob. May be invoked in thread.
def upload_blob(container, file, index=0, result=None):
    if result is None:
        result = [None]

    try:
        # extract blob name from file path
        blob_name = ''.join(os.path.splitext(os.path.basename(file)))

        blob = BlobClient.from_connection_string(
            conn_str='CONNECTION STRING',
            container_name=container,
            blob_name=blob_name
        )

        with open(file, "rb") as data:
            blob.upload_blob(data, overwrite=True)

        print(f'Upload succeeded: {blob_name}')
        result[index] = True # example of returning result
    except Exception as e:
        print(e) # do something useful here
        result[index] = False # example of returning result


# container: string of container name. This example assumes the container exists.
# files: list of file paths.    
def upload_wrapper(container, files):
    # here, you can define a better threading/batching strategy than what is written
    # this code just creates a new thread for each file to be uploaded
    parallel_runs = len(files)
    threads = [None] * parallel_runs
    results = [None] * parallel_runs
    for i in range(parallel_runs):
        t = Thread(target=upload_blob, args=(container, files[i], i, results))
        threads[i] = t
        threads[i].start()

    for i in range(parallel_runs):  # wait for all threads to finish
        threads[i].join()

    # do something with results here

There may be better chunking strategies - this is just an example to illustrate that for certain cases you may be able to achieve greater blob upload performance by using threading.

Here are some benchmarks between the sequential looping approach vs. the above threaded approach (482 image files, 26 MB total):

  • Sequential upload: 89 seconds
  • Threaded upload: 28 seconds

I should also add that you might consider invoking azcopy via Python, as this tool is may be better suited for your particular need.

like image 33
ryanpfalz Avatar answered Mar 07 '23 17:03

ryanpfalz