Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Write Python DataFrame as CSV into Azure Blob

I have got two questions on reading and writing Python objects from/to Azure blob storage.

  1. Can someone tell me how to write Python dataframe as csv file directly into Azure Blob without storing it locally?

    I tried using the functions create_blob_from_text and create_blob_from_stream but none of them works.

    Converting dataframe to string and using create_blob_from_text function writes the file into the blob but as a plain string but not as csv.

    df_b = df.to_string()
    block_blob_service.create_blob_from_text('test', 'OutFilePy.csv', df_b)  
    
  2. How to directly read a json file in Azure blob storage directly into Python?

like image 986
AngiSen Avatar asked Apr 25 '18 05:04

AngiSen


People also ask

Can Azure Blob storage store structured data?

HDInsight can use a blob container in Azure Storage as the default file system for the cluster. Through a Hadoop distributed file system (HDFS) interface provided by a WASB driver, the full set of components in HDInsight can operate directly on structured or unstructured data stored as blobs.

How do I upload files to Azure Blob Storage using API?

Create an access policy with write permission. Create an asset. Create a SAS locator and create the upload URL. Upload a file to blob storage using the upload URL.


2 Answers

The approved answer did not work for me, as it depends on the azure-storage (deprecated/legacy as of 2021) package. I changed it as follows:

from azure.storage.blob import *
import dotenv
import io
import pandas as pd

dotenv.load_dotenv()
blob_block = ContainerClient.from_connection_string(
    conn_str=os.environ["CONNECTION_STRING"],
    container_name=os.environ["CONTAINER_NAME"]
    )
output = io.StringIO()
partial = df.DataFrame()
output = partial.to_csv(encoding='utf-8')
blob_block.upload_blob(name, output, overwrite=True, encoding='utf-8')
like image 175
Dimiter Shalvardjiev Avatar answered Oct 19 '22 05:10

Dimiter Shalvardjiev


  1. Can someone tell me how to write Python dataframe as csv file directly into Azure Blob without storing it locally?

You could use pandas.DataFrame.to_csv method.

Sample code:

from azure.storage.blob import (
    BlockBlobService
)
import pandas as pd
import io

output = io.StringIO()
head = ["col1" , "col2" , "col3"]
l = [[1 , 2 , 3],[4,5,6] , [8 , 7 , 9]]
df = pd.DataFrame (l , columns = head)
print(df)
output = df.to_csv (index_label="idx", encoding = "utf-8")
print(output)

accountName = "***"
accountKey = "***"
containerName = "test1"
blobName = "test3.json"

blobService = BlockBlobService(account_name=accountName, account_key=accountKey)

blobService.create_blob_from_text('test1', 'OutFilePy.csv', output)

Output result:

enter image description here

2.How to directly read a json file in Azure blob storage directly into Python?

Sample code:

from azure.storage.blob import (
    BlockBlobService
)

accountName = "***"
accountKey = "***"
containerName = "test1"
blobName = "test3.json"

blobService = BlockBlobService(account_name=accountName, account_key=accountKey)

result = blobService.get_blob_to_text(containerName,blobName)

print(result.content)

Output result:

enter image description here

Hope it helps you.

like image 37
Jay Gong Avatar answered Oct 19 '22 06:10

Jay Gong