How to read partitioned parquet file into polars?

Question

I'd like to read a partitioned parquet file into a polars dataframe.

In spark, it is simple:

df = spark.read.parquet("/my/path")

The polars documentation says that it should work the same way:

df = pl.read_parquet("/my/path")

But it gives me the error:

raise IsADirectoryError(f"Expected a file path; {path!r} is a directory")

How to read this file?

alexander-beedie · Accepted Answer

As an example using S3 (since you say your files are cloud-hosted), you first establish a filesystem connection (via fsspec) and a dataset against it (as suggested by Dean) and then read into polars from that:

from pyarrow.dataset import dataset
from s3fs import S3FileSystem
import polars as pl

# setup cloud filesystem access
cloudfs = S3FileSystem( ... )

# reference multiple parquet files
pyarrow_dataset = dataset(
    source = "s3://bucket/path/*.parquet",
    filesystem = cloudfs,
    format = 'parquet',
)

# load efficiently into polars
ldf = pl.scan_pyarrow_dataset( pyarrow_dataset )

Dean MacGregor · Answer

Here's a snippet of the source code:

if isinstance(source, str) and "*" in source and _is_local_file(source):
    from polars import scan_parquet

    scan = scan_parquet(
            source,
            n_rows=n_rows,
            rechunk=True,
            parallel=parallel,
            row_count_name=row_count_name,
            row_count_offset=row_count_offset,
            low_memory=low_memory,
        )

The important bit is that it's looking for an * in the source path.

So it seems you just need to do

df = pl.read_parquet("/my/path/*")

This only works on local filesystems so if you're reading from cloud storage then you'd have to use pyarrow datasets to read multiple files at once without iterating over them yourself.

How to read partitioned parquet file into polars?

Tags:

python

parquet

python-polars

lmocsi

2 Answers

alexander-beedie

Dean MacGregor

Recent Activity

Donate For Us

How to read partitioned parquet file into polars?

Tags:

python

parquet

python-polars

lmocsi

2 Answers

alexander-beedie

Dean MacGregor

Related questions

Recent Activity

Donate For Us