I have a dataframe with following dtype:
[2020-02-06 19:15:06,579] {logging_mixin.py:95} INFO -
campanha object
chave_sistema_origem int64
valor_ajustado object
The column valor_ajustado
has some value that is throwing me an exception when I try to write a parquet file using df.to_parquet(buffer, index=False)
[2020-02-06 19:15:06,597] {taskinstance.py:1047} ERROR - an integer is required (got type str)
...
File "/Users/jackhammer/.virtualenvs/python373/lib/python3.7/site-packages/pyarrow/pandas_compat.py", line 540, in convert_column
result = pa.array(col, type=type_, from_pandas=True, safe=safe)
File "pyarrow/array.pxi", line 207, in pyarrow.lib.array
File "pyarrow/array.pxi", line 78, in pyarrow.lib._ndarray_to_array
I know that column valor_ajustado
has values like:
0
123,48
1
493,987
Anyone knows why it's trying to manipulate integers instead of keep column as an object?
There is no data type in Apache Arrow to hold Python objects so a supported strong data type has to be inferred (this is also true of Parquet files). I would cleansing the valor_adjustado
column to make sure all the values are numeric (there must be a string or some other bad value within).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With