In pyarrow, what is the suggested way of writing a pyarrow.Tensor
(e.g. created from a numpy.ndarray
) to a Parquet file? Is it even possible without having to go through pyarrow.Table
and pandas.DataFrame
?
The data model for Parquet is tabular, so somewhere the tensor/ndarray must get converted to a tabular form. We don't have any built-in convenience functions to help with this, but feel free to make specific feature requests on the issue tracker https://issues.apache.org/jira/projects/ARROW
The Parquet format is optimised for tables with nested data, i.e. it expects that data is represented as named columns. This is a bit in contrast to the idea of n-dimensional columns. For tensors, it is better to choose a different format.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With