Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Error when reading a parquet file with polars which was saved with pandas

I'd like to read a parquet file with polars (0.19.19) that was saved using pandas (2.1.3).

test_df = pd.DataFrame({"a":[10,10,0,100,0]})
test_df["b"] = test_df.a.astype("category")
test_df.to_parquet("test_df.parquet")

test_pl_df = pl.read_parquet("test_df.parquet")

I get this error:

polars.exceptions.ComputeError: only string-like values are supported in dictionaries

How can I read the parquet file with polars?

Reading with pandas first works, but seems rather ugly and does not allow lazy methods such as scan_parquet.

test_pa_pl_df = pl.from_pandas(pd.read_parquet("test_df.parquet", dtype_backend="pyarrow"))
like image 328
ivegotaquestion Avatar asked Oct 25 '25 07:10

ivegotaquestion


1 Answers

In a pure sense, you can't read it (at least not in its entirety) with polars because polars doesn't support categorical columns except when the underlying dtype is a string.

There is a better shortcut than round tripping through pandas (which is itself using pyarrow). To read it eagerly you can just do:

test_pl_df = pl.read_parquet("test_df.parquet", use_pyarrow=True)

and it will just turn b into a regular integer column.

If you want a lazy version then you can use a pyarrow dataset like this:

import pyarrow.dataset as ds
test_pl_lf = pl.scan_pyarrow_dataset(ds.dataset("test_df.parquet"))

Alternatively, you can lazy load it with polars and then drop the b column.

test_pl_lf = pl.scan_parquet("test_df.parquet").select('a')
like image 103
Dean MacGregor Avatar answered Oct 27 '25 00:10

Dean MacGregor



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!