I'm creating parquet file from python list of dictionary with pandas and pyarrow. But getting following error for empty nasted dictionary.
Cannot write struct type 'subject' with no child field to Parquet. Consider adding a dummy child field
code below.
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
data =[
{
"name":"david",
"subject":{}
}
]
df = pd.DataFrame.from_records(data)
table = pa.Table.from_pandas(df)
pq.write_table(table, 'file1.parquet')
Arrow is unable to guess the type of "subject" with the data you gave it (because it's empty). "subject" could be either:
In order to clear this ambiguity you need to provide an explicit schema to Table.from_pandas function:
schema = pa.schema([
pa.field("name", pa.string()),
pa.field("subject", pa.map_(pa.string(), pa.string())),
])
table = pa.Table.from_pandas(df, schema=schema)
But even with the schema it doesn't work becuase arrow expects the dictionary data to be represented as a list of tuples (instead of a dict):
data =[
{ "name":"david","subject": []},
{ "name":"john","subject": [("key1", "value1"), ("key2", "value2")]},
]
schema = pa.schema([
pa.field("name", pa.string()),
pa.field("subject", pa.map_(pa.string(), pa.string())),
])
table = pa.Table.from_pandas(df, schema=schema)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With