While trying to use to_gbq for updating Google BigQuery table, I get a response of:
GenericGBQException: Reason: 400 Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1.
My code:
gbq.to_gbq(mini_df,'Name-of-Table','Project-id',chunksize=10000,reauth=False,if_exists='append',private_key=None)
and my dataframe of mini_df looks like:
date request_number name feature_name value_name value
2018-01-10 1 1 "a" "b" 0.309457
2018-01-10 1 1 "c" "d" 0.273748
While I'm running the to_gbq, and there's no table on the BigQuery, I can see that the table is created with the next schema:
date STRING NULLABLE
request_number STRING NULLABLE
name STRING NULLABLE
feature_name STRING NULLABLE
value_name STRING NULLABLE
value FLOAT NULLABLE
What am I doing wrong? How can I solve this?
P.S, rest of the exception:
BadRequest Traceback (most recent call last)
~/anaconda3/envs/env/lib/python3.6/site-packages/pandas_gbq/gbq.py in load_data(self, dataframe, dataset_id, table_id, chunksize)
589 destination_table,
--> 590 job_config=job_config).result()
591 except self.http_error as ex:
~/anaconda3/envs/env/lib/python3.6/site-packages/google/cloud/bigquery/job.py in result(self, timeout)
527 # TODO: modify PollingFuture so it can pass a retry argument to done().
--> 528 return super(_AsyncJob, self).result(timeout=timeout)
529
~/anaconda3/envs/env/lib/python3.6/site-packages/google/api_core/future/polling.py in result(self, timeout)
110 # Pylint doesn't recognize that this is valid in this case.
--> 111 raise self._exception
112
BadRequest: 400 Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1.
During handling of the above exception, another exception occurred:
GenericGBQException Traceback (most recent call last)
<ipython-input-28-195df93249b6> in <module>()
----> 1 gbq.to_gbq(mini_df,'Name-of-Table','Project-id',chunksize=10000,reauth=False,if_exists='append',private_key=None)
~/anaconda3/envs/env/lib/python3.6/site-packages/pandas/io/gbq.py in to_gbq(dataframe, destination_table, project_id, chunksize, verbose, reauth, if_exists, private_key)
106 chunksize=chunksize,
107 verbose=verbose, reauth=reauth,
--> 108 if_exists=if_exists, private_key=private_key)
~/anaconda3/envs/env/lib/python3.6/site-packages/pandas_gbq/gbq.py in to_gbq(dataframe, destination_table, project_id, chunksize, verbose, reauth, if_exists, private_key, auth_local_webserver)
987 table.create(table_id, table_schema)
988
--> 989 connector.load_data(dataframe, dataset_id, table_id, chunksize)
990
991
~/anaconda3/envs/env/lib/python3.6/site-packages/pandas_gbq/gbq.py in load_data(self, dataframe, dataset_id, table_id, chunksize)
590 job_config=job_config).result()
591 except self.http_error as ex:
--> 592 self.process_http_error(ex)
593
594 rows = []
~/anaconda3/envs/env/lib/python3.6/site-packages/pandas_gbq/gbq.py in process_http_error(ex)
454 # <https://cloud.google.com/bigquery/troubleshooting-errors>`__
455
--> 456 raise GenericGBQException("Reason: {0}".format(ex))
457
458 def run_query(self, query, **kwargs):
GenericGBQException: Reason: 400 Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1.
I've had the very same problem.
In my case it depended on the data type object
of the Data Frame.
I've had three columns externalId
, mappingId
, info
. For none of those fields I set a data type and let pandas do it's magic.
It decided to set all three column data types to object
. Problem is, internally the to_gbq
component uses the to_json
component. For some reason or another this output omits the quotes around the data field if the type of the field is object
but holds only numerical values.
So Google Big Query needed this
{"externalId": "12345", "mappingId":"abc123", "info":"blerb"}
but got this:
{"externalId": 12345, "mappingId":"abc123", "info":"blerb"}
And because the mapping of the field was STRING
in Google Big Query, the import process failed.
Two solutions came up.
Solution 1 - Change the data type of the column
A simple type conversion helped with this issue. I also had to change the data type in Big Query to INTEGER
.
df['externalId'] = df['externalId'].astype('int')
If this is the case, Big Query can consume fields without quotes as the JSON standard says.
Solution 2 - Make sure the string field is a string
Again, this is setting the data type. But since we set it explicitly to String
, the export with to_json
prints out a quoted field and everything worked fine.
df['externalId'] = df['externalId'].astype('str')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With