Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pyarrow: TypeError: an integer is required (got type str)

I have a dataframe with following dtype:

[2020-02-06 19:15:06,579] {logging_mixin.py:95} INFO - 
campanha                      object
chave_sistema_origem           int64
valor_ajustado                object

The column valor_ajustado has some value that is throwing me an exception when I try to write a parquet file using df.to_parquet(buffer, index=False)

[2020-02-06 19:15:06,597] {taskinstance.py:1047} ERROR - an integer is required (got type str)
...
  File "/Users/jackhammer/.virtualenvs/python373/lib/python3.7/site-packages/pyarrow/pandas_compat.py", line 540, in convert_column
    result = pa.array(col, type=type_, from_pandas=True, safe=safe)
  File "pyarrow/array.pxi", line 207, in pyarrow.lib.array
  File "pyarrow/array.pxi", line 78, in pyarrow.lib._ndarray_to_array

I know that column valor_ajustado has values like:

0

123,48

1

493,987

Anyone knows why it's trying to manipulate integers instead of keep column as an object?


1 Answers

There is no data type in Apache Arrow to hold Python objects so a supported strong data type has to be inferred (this is also true of Parquet files). I would cleansing the valor_adjustado column to make sure all the values are numeric (there must be a string or some other bad value within).

like image 120
Wes McKinney Avatar answered Sep 07 '25 15:09

Wes McKinney