How to store pandas dataframe data to azure blobs using python?

Question

I want to store processed data in pandas dataframe to azure blobs in parquet file format. But before uploading to blobs, I have to store it as parquet file in local disk and then upload. I want to write pyarrow.table into pyarrow.parquet.NativeFile and upload it directly. Can anyone help me with this. Below code is working fine:

import pyarrow as pa
import pyarrow.parquet as pq

battery_pq = pd.read_csv('test.csv')

######## SOme Data Processing

battery_pq = pa.Table.from_pandas(battery_pq)
pq.write_table(battery_pq,'example.parquet')
block_blob_service.create_blob_from_path(container_name,'example.parquet','example.parquet')

Need to create the file in memory(I/O file type object) and then upload it to blob.

Uwe L. Korn · Accepted Answer

You can either use io.BytesIO for this or alternatively Apache Arrow also provides its native implementation BufferOutputStream. The benefit of this is that this writes to the stream without the overhead of going through Python. Thus less copies are made and the GIL is released.

import pyarrow as pa
import pyarrow.parquet as pq

df = some pandas.DataFrame
table = pa.Table.from_pandas(df)
buf = pa.BufferOutputStream()
pq.write_table(table, buf)
block_blob_service.create_blob_from_bytes(
    container,
    "example.parquet",
    buf.getvalue().to_pybytes()
)

Roman · Answer

There's a new python SDK version. create_blob_from_bytes is now legacy

import pandas as pd
from azure.storage.blob import BlobServiceClient
from io import BytesIO

blob_service_client = BlobServiceClient.from_connection_string(blob_store_conn_str)
blob_client = blob_service_client.get_blob_client(container=container_name, blob=blob_path)

parquet_file = BytesIO()
df.to_parquet(parquet_file, engine='pyarrow')
parquet_file.seek(0)  # change the stream position back to the beginning after writing

blob_client.upload_blob(
    data=parquet_file
)

How to store pandas dataframe data to azure blobs using python?

Tags:

python

pandas

blob

azure

parquet

Bhanuday Birla

2 Answers

Uwe L. Korn

Roman

Recent Activity

Donate For Us

How to store pandas dataframe data to azure blobs using python?

Tags:

python

pandas

blob

azure

parquet

Bhanuday Birla

2 Answers

Uwe L. Korn

Roman

Related questions

Recent Activity

Donate For Us