Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Uploading 10,000,000 files to Azure blob storage from Linux

I have some experience with S3, and in the past have used s3-parallel-put to put many (millions) small files there. Compared to Azure, S3 has an expensive PUT price so I'm thinking to switch to Azure.

I don't however seem to be able to figure out how to sync a local directory to a remote container using azure cli. In particular, I have the following questions:

1- aws client provides a sync option. Is there such an option for azure?

2- Can I concurrently upload multiple files to Azure storage using cli? I noticed that there is a -concurrenttaskcount flag for azure storage blob upload, so I assume it must be possible in principle.

like image 439
user31208 Avatar asked Sep 18 '14 06:09

user31208


People also ask

How do I automatically upload files to Azure blob storage?

Create Power Automate Desktop Flow Go to containers and create a new container. Open the container and on the and navigate to Shared access signature. Select add, create, and write permission, change the time if needed, and press Generate SAS token and URL. Copy the Blob SAS URL and save it as the variable in the flow.


1 Answers

If you prefer the commandline and have a recent Python interpreter, the Azure Batch and HPC team has released a code sample with some AzCopy-like functionality on Python called blobxfer. This allows full recursive directory ingress into Azure Storage as well as full container copy back out to local storage. [full disclosure: I'm a contributor for this code]

To answer your questions:

  1. blobxfer supports rsync-like operations using MD5 checksum comparisons for both ingress and egress
  2. blobxfer performs concurrent operations, both within a single file and across multiple files. However, you may want to split up your input across multiple directories and containers which will not only help reduce memory usage in the script but also will partition your load better
like image 79
fpark Avatar answered Oct 02 '22 12:10

fpark