Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading/writing pyarrow tensors from/to parquet files

In pyarrow, what is the suggested way of writing a pyarrow.Tensor (e.g. created from a numpy.ndarray) to a Parquet file? Is it even possible without having to go through pyarrow.Table and pandas.DataFrame?

like image 528
Martin Studer Avatar asked Oct 17 '17 15:10

Martin Studer


2 Answers

The data model for Parquet is tabular, so somewhere the tensor/ndarray must get converted to a tabular form. We don't have any built-in convenience functions to help with this, but feel free to make specific feature requests on the issue tracker https://issues.apache.org/jira/projects/ARROW

like image 120
Wes McKinney Avatar answered Sep 25 '22 02:09

Wes McKinney


The Parquet format is optimised for tables with nested data, i.e. it expects that data is represented as named columns. This is a bit in contrast to the idea of n-dimensional columns. For tensors, it is better to choose a different format.

like image 33
Uwe L. Korn Avatar answered Sep 26 '22 02:09

Uwe L. Korn