Pandas dataframe is heavy weight so I want to avoid that. But I want to construct Pyarrow Table in order to store the data in parquet format.
I search and read the documentation and I try to use the from_array() but it is not working.
field=[pa.field('name',pa.string()),pa.field('age',pa.int64())]
arrays=[pa.array(['Tom']),pa.array([23])]
pa.Table.from_arrays(pa.schema(field),arrays)
the error is: Length of names (1) doesn't match length of arrays (2)
dtype(default: None). It is used to force the DataFrame to be created and have only those values or convert the values to the specified dtype.
Dataframe is a Pandas object. To create a dataframe, we need to import pandas. Dataframe can be created using dataframe() function. The dataframe() takes one or two parameters.
See the Table.from_arrays
dcumentation here: https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.from_arrays
The first argument it expects are the arrays, not the schema. So you can either do:
In [64]: pa.Table.from_arrays(arrays, schema=pa.schema(field))
Out[64]:
pyarrow.Table
name: string
age: int64
Or pass the column names instead of the full schema:
In [65]: pa.Table.from_arrays(arrays, names=['name', 'age'])
Out[65]:
pyarrow.Table
name: string
age: int64
In the next version of pyarrow (0.14.0), you will also be able to do:
In [51]: pa.Table.from_pydict({'name': pa.array(['Tom']), 'age': pa.array([23])})
Out[51]:
pyarrow.Table
name: string
age: int64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With