Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using data to construct Table. Avoid creating dataframe

Tags:

Pandas dataframe is heavy weight so I want to avoid that. But I want to construct Pyarrow Table in order to store the data in parquet format.

I search and read the documentation and I try to use the from_array() but it is not working.

field=[pa.field('name',pa.string()),pa.field('age',pa.int64())]
arrays=[pa.array(['Tom']),pa.array([23])]
pa.Table.from_arrays(pa.schema(field),arrays)

the error is: Length of names (1) doesn't match length of arrays (2)

like image 225
Zichu Lee Avatar asked Jun 17 '19 21:06

Zichu Lee


People also ask

When creating a DataFrame using the DataFrame constructor is the default Dtype?

dtype(default: None). It is used to force the DataFrame to be created and have only those values or convert the values to the specified dtype.

Which of the following Pandas method is used to create a DataFrame?

Dataframe is a Pandas object. To create a dataframe, we need to import pandas. Dataframe can be created using dataframe() function. The dataframe() takes one or two parameters.


1 Answers

See the Table.from_arrays dcumentation here: https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.from_arrays The first argument it expects are the arrays, not the schema. So you can either do:

In [64]: pa.Table.from_arrays(arrays, schema=pa.schema(field))
Out[64]: 
pyarrow.Table
name: string
age: int64

Or pass the column names instead of the full schema:

In [65]: pa.Table.from_arrays(arrays, names=['name', 'age']) 
Out[65]: 
pyarrow.Table
name: string
age: int64

In the next version of pyarrow (0.14.0), you will also be able to do:

In [51]: pa.Table.from_pydict({'name': pa.array(['Tom']), 'age': pa.array([23])})
Out[51]: 
pyarrow.Table
name: string
age: int64
like image 70
joris Avatar answered Sep 28 '22 17:09

joris