Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Google BigQuery Schema conflict (pyarrow error) with Numeric data type using load_table_from_dataframe

I got the following error when I upload numeric data (int64 or float64) from a Pandas dataframe to a "Numeric" Google BigQuery Data Type:

pyarrow.lib.ArrowInvalid: Got bytestring of length 8 (expected 16)

I tried to change the datatype of 'tt' field from Pandas dataframe without results:

df_data_f['tt'] = df_data_f['tt'].astype('float64')

and

df_data_f['tt'] = df_data_f['tt'].astype('int64')

Using the schema:

 job_config.schema = [
                    ...             
                    bigquery.SchemaField('tt', 'NUMERIC')
                    ...]

Reading this google-cloud-python issues report I got:

NUMERIC = pyarrow.decimal128(38, 9)

Therefore the "Numeric" Google BigQuery Data Type uses more bytes than "float64" or "int64", and that is why pyarrow can't match the datatypes.


I have:

Python 3.6.4

pandas 1.0.3

pyarrow 0.17.0

google-cloud-bigquery 1.24.0

like image 852
David Valenzuela Urrutia Avatar asked Jun 17 '26 03:06

David Valenzuela Urrutia


1 Answers

I'm not sure If this is the best solution, but I solved this issue changing the datatype:

import decimal
...
df_data_f['tt'] = df_data_f['tt'].astype(str).map(decimal.Decimal)
like image 67
David Valenzuela Urrutia Avatar answered Jun 18 '26 15:06

David Valenzuela Urrutia



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!