Is it possible to open parquet files and iterate line by line, using generators? This is to avoid loading the whole parquet file into memory.
The content of the file is pandas DataFrame.
You can not iterate by line as it is not the way it is stored. You can iterate through the row-groups as following:
from fastparquet import ParquetFile
pf = ParquetFile('myfile.parq')
for df in pf.iter_row_groups():
process sub-data-frame df
You can iterate using tensorflow_io.
import tensorflow_io as tfio
dataset = tfio.IODataset.from_parquet('myfile.parquet')
for line in dataset.take(3):
# print the first 3 lines
print(line)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With