Is it possible to read parquet files in chunks?

1 Answers

If your parquet file was not created with row groups, the read_row_group method doesn't seem to work (there is only one group!).

However if your parquet file is partitioned as a directory of parquet files you can use the fastparquet engine, which only works on individual files, to read files then, concatenate the files in pandas or get the values and concatenate the ndarrays

import pandas as pd
from glob import glob
files = sorted(glob('dat.parquet/part*'))

data = pd.read_parquet(files[0],engine='fastparquet')
for f in files[1:]:
    data = pd.concat([data,pd.read_parquet(f,engine='fastparquet')])

126

answered Oct 22 '22 12:10

lee

Related questions
                            
                                Spark with Avro, Kryo and Parquet
                            
                                pandas write dataframe to parquet format with append
                            
                                Repartition Dask DataFrame to get even partitions
                            
                                Why are Spark Parquet files for an aggregate larger than the original?
                            
                                Pandas DataFrame with categorical columns from a Parquet file using read_parquet?
                            
                                Unable to read a parquet file
                            
                                How to convert Numpy to Parquet without using Pandas?
                            
                                Does Google BigQuery supports Parquet file format?
                            
                                Spark Parquet Statistics(min/max) integration
                            
                                Can python fastparquet module read in compressed parquet file?
                            
                                Serialize parquet data with C#
                            
                                Spark SQL unable to complete writing Parquet data with a large number of shards
                            
                                Using hive table over parquet in Pig
                            
                                How to avoid creation of .crc files when parquet files are created
                            
                                How to use the new Int64 pandas object when saving to a parquet file
                            
                                Out of memory error when writing out spark dataframes to parquet format
                            
                                How to More Efficiently Load Parquet Files in Spark (pySpark v1.2.0)
                            
                                Creating parquet files in spark with row-group size that is less than 100
                            
                                Storing multiple dataframes of different widths with Parquet?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is it possible to read parquet files in chunks?

Tags:

parquet

xiaodai

People also ask

1 Answers

lee

Recent Activity

Donate For Us