I am breaking my head over this right now. I am new to this parquet
files, and I am running into a LOT of issues with it.
I am thrown an error that reads OSError: Passed non-file path: \datasets\proj\train\train.parquet
each time I try to create a df
from it.
I've tried this:
pq.read_pandas(r'E:\datasets\proj\train\train.parquet').to_pandas()
AND
od = pd.read_parquet(r'E:\datasets\proj\train\train.parquet', engine='pyarrow')
I also changed the drive letter of the drive the dataset resides, and it's the SAME THING!
It's the same with all engines.
PLEASE HELP!
parquet file formats. You can open a file by selecting from file picker, dragging on the app or double-clicking a . parquet file on disk. This utility is free forever and needs you feedback to continue improving.
Parquet is a columnar format that is supported by many other data processing systems. Spark SQL provides support for both reading and writing Parquet files that automatically preserves the schema of the original data.
This might be a problem with Arrow's file path handling. You could instead pass in an already opened file:
import pandas as pd
with open(r'E:\datasets\proj\train\train.parquet', 'rb') as f:
df = pd.read_parquet(f, engine='pyarrow')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With