I am using Pyarrow library for optimal storage of Pandas DataFrame. I need to process pyarrow Table row by row as fast as possible without converting it to pandas DataFrame (it won't fit in memory). Pandas has iterrows()/iterrtuples() methods. Is there any fast way to iterate Pyarrow Table except for-loop and index addressing?
This code worked for me:
for batch in table.to_batches():
d = batch.to_pydict()
for c1, c2, c3 in zip(d['c1'], d['c2'], d['c3']):
# Do something with the row of c1, c2, c3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With