Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read partitioned parquet file into polars?

I'd like to read a partitioned parquet file into a polars dataframe.

In spark, it is simple:

df = spark.read.parquet("/my/path")

The polars documentation says that it should work the same way:

df = pl.read_parquet("/my/path")

But it gives me the error:

raise IsADirectoryError(f"Expected a file path; {path!r} is a directory")

How to read this file?

like image 768
lmocsi Avatar asked Oct 18 '25 23:10

lmocsi


2 Answers

As an example using S3 (since you say your files are cloud-hosted), you first establish a filesystem connection (via fsspec) and a dataset against it (as suggested by Dean) and then read into polars from that:

from pyarrow.dataset import dataset
from s3fs import S3FileSystem
import polars as pl

# setup cloud filesystem access
cloudfs = S3FileSystem( ... )

# reference multiple parquet files
pyarrow_dataset = dataset(
    source = "s3://bucket/path/*.parquet",
    filesystem = cloudfs,
    format = 'parquet',
)

# load efficiently into polars
ldf = pl.scan_pyarrow_dataset( pyarrow_dataset )
like image 198
alexander-beedie Avatar answered Oct 20 '25 12:10

alexander-beedie


Here's a snippet of the source code:

if isinstance(source, str) and "*" in source and _is_local_file(source):
    from polars import scan_parquet

    scan = scan_parquet(
            source,
            n_rows=n_rows,
            rechunk=True,
            parallel=parallel,
            row_count_name=row_count_name,
            row_count_offset=row_count_offset,
            low_memory=low_memory,
        )

The important bit is that it's looking for an * in the source path.

So it seems you just need to do

df = pl.read_parquet("/my/path/*")

This only works on local filesystems so if you're reading from cloud storage then you'd have to use pyarrow datasets to read multiple files at once without iterating over them yourself.

like image 23
Dean MacGregor Avatar answered Oct 20 '25 13:10

Dean MacGregor



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!