Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to iterate Pyarrow Table

Tags:

pandas

pyarrow

I am using Pyarrow library for optimal storage of Pandas DataFrame. I need to process pyarrow Table row by row as fast as possible without converting it to pandas DataFrame (it won't fit in memory). Pandas has iterrows()/iterrtuples() methods. Is there any fast way to iterate Pyarrow Table except for-loop and index addressing?

like image 698
Alexandr Proskurin Avatar asked Nov 05 '18 15:11

Alexandr Proskurin


1 Answers

This code worked for me:

for batch in table.to_batches():
    d = batch.to_pydict()
    for c1, c2, c3 in zip(d['c1'], d['c2'], d['c3']):
        # Do something with the row of c1, c2, c3
like image 154
Bolo Avatar answered Sep 17 '22 23:09

Bolo