Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unable to read a parquet file

I am breaking my head over this right now. I am new to this parquet files, and I am running into a LOT of issues with it.

I am thrown an error that reads OSError: Passed non-file path: \datasets\proj\train\train.parquet each time I try to create a df from it.

I've tried this: pq.read_pandas(r'E:\datasets\proj\train\train.parquet').to_pandas() AND od = pd.read_parquet(r'E:\datasets\proj\train\train.parquet', engine='pyarrow')

I also changed the drive letter of the drive the dataset resides, and it's the SAME THING!

It's the same with all engines.

PLEASE HELP!

like image 509
Anonymous Person Avatar asked Mar 13 '19 16:03

Anonymous Person


People also ask

How do I view a Parquet file?

parquet file formats. You can open a file by selecting from file picker, dragging on the app or double-clicking a . parquet file on disk. This utility is free forever and needs you feedback to continue improving.

Can we read a Parquet file?

Parquet is a columnar format that is supported by many other data processing systems. Spark SQL provides support for both reading and writing Parquet files that automatically preserves the schema of the original data.


1 Answers

This might be a problem with Arrow's file path handling. You could instead pass in an already opened file:

import pandas as pd

with open(r'E:\datasets\proj\train\train.parquet', 'rb') as f:
    df = pd.read_parquet(f, engine='pyarrow')
like image 117
Uwe L. Korn Avatar answered Oct 09 '22 01:10

Uwe L. Korn