Assign schema to pa.Table.from_pandas()

Tags:

Im getting this error when transforming a pandas.DF to parquet using pyArrow:

ArrowInvalid('Error converting from Python objects to Int64: Got Python object of type str but can only handle these types: integer

To find out which column is the problem I made a new df in a for loop, first with the first column and for each loop adding another column. I realized that the error is in a column of dtype: object that starts with 0s, I guess that's why pyArrow wants to convert the column to int but fails because other values are UUID

Im trying to pass a schema: (not sure if this is the way to go)

table = pa.Table.from_pandas(df, schema=schema, preserve_index=False)

where schema is: df.dtypes

254

asked Mar 29 '18 22:03

Carlos P Ceballos

1 Answers

Carlos have you tried converting the column to one of the pandas types listed here https://arrow.apache.org/docs/python/pandas.html?

Can you post the output of df.dtypes?

If changing the pandas column type doesn't help you can define a pyarrow schema to pass in.

fields = [
    pa.field('id', pa.int64()),
    pa.field('secondaryid', pa.int64()),
    pa.field('date', pa.timestamp('ms')),
]

my_schema = pa.schema(fields)

table = pa.Table.from_pandas(sample_df, schema=my_schema, preserve_index=False)

More information here:

https://arrow.apache.org/docs/python/data.html https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.from_pandas https://arrow.apache.org/docs/python/generated/pyarrow.schema.html

160

answered Oct 17 '22 17:10

Alexander

Related questions
                            
                                Export the result of cloud speech API to JSON file using Python
                            
                                Python Pandas - Sort Values by keeping a specific order
                            
                                cx_Freeze "no module named google" Error
                            
                                How can I delete a repeated dictionary in list?
                            
                                How to apply logarithmic axis labels without log scaling image (matplotlib imshow)
                            
                                Issue with Matplotlib scatterplot and Color maps
                            
                                Interactive brokers: How to retrieve transaction history records?
                            
                                Why are Python Lists called 'lists' when they are implemented as dynamic arrays
                            
                                Is it possible to construct a dictionary comprehension from a list of unparsed strings without double split? [duplicate]
                            
                                Python - Checking if file is created today
                            
                                Python - How to pass a method as an argument to call a method from another library
                            
                                Serverless AWS (Python) read from S3 : Access Denied
                            
                                Abstract base class model vs Proxy model in Django
                            
                                Module 'pandas' has no attribute 'DataFrame'
                            
                                Assign group averages to each row in python/pandas
                            
                                Combination of lists from two lists of strings
                            
                                Is it possible to delete/downgrade python packages from Google Colab?
                            
                                export Keras model to .pb file and optimize for inference gives random guess on Android
                            
                                ftplib MLSD command gives 500 Unknown command
                            
                                How can I replace values in an xarray variable?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Assign schema to pa.Table.from_pandas()

Tags:

python

pandas

parquet

pyarrow

Carlos P Ceballos

People also ask

1 Answers

Alexander

Recent Activity

Donate For Us