Im getting this error when transforming a pandas.DF to parquet using pyArrow:
ArrowInvalid('Error converting from Python objects to Int64: Got Python object of type str but can only handle these types: integer
To find out which column is the problem I made a new df in a for loop, first with the first column and for each loop adding another column. I realized that the error is in a column of dtype: object
that starts with 0s, I guess that's why pyArrow wants to convert the column to int
but fails because other values are UUID
Im trying to pass a schema: (not sure if this is the way to go)
table = pa.Table.from_pandas(df, schema=schema, preserve_index=False)
where schema is: df.dtypes
class pyarrow. Schema. Bases: _Weakrefable. A named collection of types a.k.a schema. A schema defines the column names and types in a record batch or table data structure.
This is the documentation of the Python API of Apache Arrow. Apache Arrow is a development platform for in-memory analytics. It contains a set of technologies that enable big data systems to store, process and move data fast.
There's a better way. It's called PyArrow — an amazing Python binding for the Apache Arrow project. It introduces faster data read/write times and doesn't otherwise interfere with your data analysis pipeline. It's the best of both worlds, as you can still use Pandas for further calculations.
Carlos have you tried converting the column to one of the pandas types listed here https://arrow.apache.org/docs/python/pandas.html?
Can you post the output of df.dtypes?
If changing the pandas column type doesn't help you can define a pyarrow schema to pass in.
fields = [
pa.field('id', pa.int64()),
pa.field('secondaryid', pa.int64()),
pa.field('date', pa.timestamp('ms')),
]
my_schema = pa.schema(fields)
table = pa.Table.from_pandas(sample_df, schema=my_schema, preserve_index=False)
More information here:
https://arrow.apache.org/docs/python/data.html https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.from_pandas https://arrow.apache.org/docs/python/generated/pyarrow.schema.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With