Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Write dataframe to blob using azure databricks

Is there any link or sample code where we can write dataframe to azure blob storage using python (not using pyspark module).

like image 547
Sankeertan Samrat Avatar asked Mar 12 '20 17:03

Sankeertan Samrat


1 Answers

Below is the code snippet for writing (dataframe) CSV data directly to an Azure blob storage container in an Azure Databricks Notebook.

# Configure blob storage account access key globally
spark.conf.set(
  "fs.azure.account.key.%s.blob.core.windows.net" % storage_name,
  sas_key)

output_container_path = "wasbs://%s@%s.blob.core.windows.net" % (output_container_name, storage_name)
output_blob_folder = "%s/wrangled_data_folder" % output_container_path

# write the dataframe as a single file to blob storage
(dataframe
 .coalesce(1)
 .write
 .mode("overwrite")
 .option("header", "true")
 .format("com.databricks.spark.csv")
 .save(output_blob_folder))

# Get the name of the wrangled-data CSV file that was just saved to Azure blob storage (it starts with 'part-')
files = dbutils.fs.ls(output_blob_folder)
output_file = [x for x in files if x.name.startswith("part-")]

# Move the wrangled-data CSV file from a sub-folder (wrangled_data_folder) to the root of the blob container
# While simultaneously changing the file name
dbutils.fs.mv(output_file[0].path, "%s/predict-transform-output.csv" % output_container_path)

Example: notebook

enter image description here

Output: Dataframe written to blob storage using Azure Databricks

enter image description here

like image 185
CHEEKATLAPRADEEP-MSFT Avatar answered Sep 30 '22 00:09

CHEEKATLAPRADEEP-MSFT