Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can you load a Polars dataframe directly into an s3 bucket as parquet?

looking for something like this:

Save Dataframe to csv directly to s3 Python

the api shows these arguments: https://pola-rs.github.io/polars/py-polars/html/reference/api/polars.DataFrame.write_parquet.html

but i'm not sure how to convert the df into a stream...

like image 815
rnd om Avatar asked Jan 24 '26 14:01

rnd om


1 Answers

Untested, since I don't have an AWS account

You could use s3fs.S3File like this:

import polars as pl
import s3fs

fs = s3fs.S3FileSystem(anon=True)  # picks up default credentials
df = pl.DataFrame(
    {
        "foo": [1, 2, 3, 4, 5],
        "bar": [6, 7, 8, 9, 10],
        "ham": ["a", "b", "c", "d", "e"],
    }
)
with fs.open('my-bucket/dataframe-dump.parquet', mode='wb') as f:
    df.write_parquet(f)

Basically s3fs gives you an fsspec conformant file object, which polars knows how to use because write_parquet accepts any regular file or streams.

If you want to manage your S3 connection more granularly, you can construct as S3File object from the botocore connection (see the docs linked above).

like image 194
suvayu Avatar answered Jan 27 '26 04:01

suvayu



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!