Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to zip files (on Azure Blob Storage) with shutil in Databricks

My trained deep learning model exists out of a couple of files in a folder. So this has nothing to do with zipping dataframes.

I want to zip this folder (in Azure Blob storage). But when I do with shutil this does not seem to work:

import shutil
modelPath = "/dbfs/mnt/databricks/Models/predictBaseTerm/noNormalizationCode/2020-01-10-13-43/9_0.8147903598547376"
zipPath= "/mnt/databricks/Deploy/" (no /dbfs here or it will error)
shutil.make_archive(base_dir= modelPath, format='zip', base_name=zipPath)

Anybody any idea how to do this and get the file onto the Azure Blob storage ( where I read it from)?

like image 483
Axxeption Avatar asked Oct 21 '25 16:10

Axxeption


1 Answers

In the end I figured it out myself.

It is not possible to directly write to dbfs (Azure Blob storage) with Shutil.

You need to first put the file on the local driver node of databricks like this (read it somewhere in the documentation that you cannot directly write to Blob storage):

import shutil
modelPath = "/dbfs/mnt/databricks/Models/predictBaseTerm/noNormalizationCode/2020-01-10-13-43/9_0.8147903598547376"
zipPath= "/tmp/model"
shutil.make_archive(base_dir= modelPath, format='zip', base_name=zipPath)

and then you can copy the file from your local driver node to blob storage. Please note the "file:" to grab the file from local storage!

blobStoragePath = "dbfs:/mnt/databricks/Models"
dbutils.fs.cp("file:" +zipPath + ".zip", blobStoragePath)
like image 141
Axxeption Avatar answered Oct 23 '25 07:10

Axxeption