Fastest way to iterate Pyarrow Table

Question

I am using Pyarrow library for optimal storage of Pandas DataFrame. I need to process pyarrow Table row by row as fast as possible without converting it to pandas DataFrame (it won't fit in memory). Pandas has iterrows()/iterrtuples() methods. Is there any fast way to iterate Pyarrow Table except for-loop and index addressing?

Bolo · Accepted Answer

This code worked for me:

for batch in table.to_batches():
    d = batch.to_pydict()
    for c1, c2, c3 in zip(d['c1'], d['c2'], d['c3']):
        # Do something with the row of c1, c2, c3

Fastest way to iterate Pyarrow Table

Tags:

pandas

pyarrow

Alexandr Proskurin

1 Answers

Bolo

Recent Activity

Donate For Us

Fastest way to iterate Pyarrow Table

Tags:

pandas

pyarrow

Alexandr Proskurin

1 Answers

Bolo

Related questions

Recent Activity

Donate For Us