Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to store pandas dataframe data to azure blobs using python?

I want to store processed data in pandas dataframe to azure blobs in parquet file format. But before uploading to blobs, I have to store it as parquet file in local disk and then upload. I want to write pyarrow.table into pyarrow.parquet.NativeFile and upload it directly. Can anyone help me with this. Below code is working fine:

import pyarrow as pa
import pyarrow.parquet as pq

battery_pq = pd.read_csv('test.csv')
######## SOme Data Processing
battery_pq = pa.Table.from_pandas(battery_pq)
pq.write_table(battery_pq,'example.parquet')
block_blob_service.create_blob_from_path(container_name,'example.parquet','example.parquet')

Need to create the file in memory(I/O file type object) and then upload it to blob.

like image 430
Bhanuday Birla Avatar asked Nov 01 '25 14:11

Bhanuday Birla


2 Answers

You can either use io.BytesIO for this or alternatively Apache Arrow also provides its native implementation BufferOutputStream. The benefit of this is that this writes to the stream without the overhead of going through Python. Thus less copies are made and the GIL is released.

import pyarrow as pa
import pyarrow.parquet as pq

df = some pandas.DataFrame
table = pa.Table.from_pandas(df)
buf = pa.BufferOutputStream()
pq.write_table(table, buf)
block_blob_service.create_blob_from_bytes(
    container,
    "example.parquet",
    buf.getvalue().to_pybytes()
)
like image 135
Uwe L. Korn Avatar answered Nov 03 '25 05:11

Uwe L. Korn


There's a new python SDK version. create_blob_from_bytes is now legacy

import pandas as pd
from azure.storage.blob import BlobServiceClient
from io import BytesIO

blob_service_client = BlobServiceClient.from_connection_string(blob_store_conn_str)
blob_client = blob_service_client.get_blob_client(container=container_name, blob=blob_path)

parquet_file = BytesIO()
df.to_parquet(parquet_file, engine='pyarrow')
parquet_file.seek(0)  # change the stream position back to the beginning after writing

blob_client.upload_blob(
    data=parquet_file
)
like image 23
Roman Avatar answered Nov 03 '25 06:11

Roman



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!