I have a parquet file and I want to read first n
rows from the file into a pandas data frame.
What I tried:
df = pd.read_parquet(path= 'filepath', nrows = 10)
It did not work and gave me error:
TypeError: read_table() got an unexpected keyword argument 'nrows'
I did try the skiprows
argument as well but that also gave me same error.
Alternatively, I can read the complete parquet file and filter the first n rows, but that will require more computations which I want to avoid.
Is there any way to achieve it?
You can use pandas to read snppay. parquet files into a python pandas dataframe.
For data analysis with Python, we all use Pandas widely. In this article, we will show that using Parquet files with Apache Arrow gives you an impressive speed advantage compared to using CSV files with Pandas while reading the content of large files.
Source: R/parquet.R. read_parquet.Rd. 'Parquet' is a columnar storage file format. This function enables you to read Parquet files into R.
After exploring around and getting in touch with the pandas dev team, the end point is pandas does not support argument nrows
or skiprows
while reading the parquet file.
The reason being that pandas use pyarrow
or fastparquet
parquet engines to process parquet file and pyarrow
has no support for reading file partially or reading file by skipping rows (not sure about fastparquet
). Below is the link of issue on pandas github for discussion.
https://github.com/pandas-dev/pandas/issues/24511
The accepted answer is out of date. It is now possible to read only the first few lines of a parquet file into pandas, though it is a bit messy and backend dependent.
To read using PyArrow as the backend, follow below:
from pyarrow.parquet import ParquetFile
import pyarrow as pa
pf = ParquetFile('file_name.pq')
first_ten_rows = next(pf.iter_batches(batch_size = 10))
df = pa.Table.from_batches([first_ten_rows]).to_pandas()
Change the line batch_size = 10
to match however many rows you want to read in.
Parquet file is column oriented storage, designed for that... So it's normal to load all the file to access just one line.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With