Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

I use to_gbq on pandas for updating Google BigQuery and get GenericGBQException

While trying to use to_gbq for updating Google BigQuery table, I get a response of:

GenericGBQException: Reason: 400 Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1.

My code:

gbq.to_gbq(mini_df,'Name-of-Table','Project-id',chunksize=10000,reauth=False,if_exists='append',private_key=None)

and my dataframe of mini_df looks like:

date    request_number  name    feature_name    value_name  value
2018-01-10  1   1   "a" "b" 0.309457
2018-01-10  1   1   "c" "d" 0.273748

While I'm running the to_gbq, and there's no table on the BigQuery, I can see that the table is created with the next schema:

date STRING NULLABLE
request_number STRING NULLABLE
name STRING NULLABLE
feature_name STRING NULLABLE
value_name STRING NULLABLE
value FLOAT NULLABLE

What am I doing wrong? How can I solve this?

P.S, rest of the exception:

BadRequest                                Traceback (most recent call last)
~/anaconda3/envs/env/lib/python3.6/site-packages/pandas_gbq/gbq.py in load_data(self, dataframe, dataset_id, table_id, chunksize)
    589                         destination_table,
--> 590                         job_config=job_config).result()
    591                 except self.http_error as ex:

~/anaconda3/envs/env/lib/python3.6/site-packages/google/cloud/bigquery/job.py in result(self, timeout)
    527         # TODO: modify PollingFuture so it can pass a retry argument to done().
--> 528         return super(_AsyncJob, self).result(timeout=timeout)
    529 

~/anaconda3/envs/env/lib/python3.6/site-packages/google/api_core/future/polling.py in result(self, timeout)
    110             # Pylint doesn't recognize that this is valid in this case.
--> 111             raise self._exception
    112 

BadRequest: 400 Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1.

During handling of the above exception, another exception occurred:

GenericGBQException                       Traceback (most recent call last)
<ipython-input-28-195df93249b6> in <module>()
----> 1 gbq.to_gbq(mini_df,'Name-of-Table','Project-id',chunksize=10000,reauth=False,if_exists='append',private_key=None)

~/anaconda3/envs/env/lib/python3.6/site-packages/pandas/io/gbq.py in to_gbq(dataframe, destination_table, project_id, chunksize, verbose, reauth, if_exists, private_key)
    106                       chunksize=chunksize,
    107                       verbose=verbose, reauth=reauth,
--> 108                       if_exists=if_exists, private_key=private_key)

~/anaconda3/envs/env/lib/python3.6/site-packages/pandas_gbq/gbq.py in to_gbq(dataframe, destination_table, project_id, chunksize, verbose, reauth, if_exists, private_key, auth_local_webserver)
    987         table.create(table_id, table_schema)
    988 
--> 989     connector.load_data(dataframe, dataset_id, table_id, chunksize)
    990 
    991 

~/anaconda3/envs/env/lib/python3.6/site-packages/pandas_gbq/gbq.py in load_data(self, dataframe, dataset_id, table_id, chunksize)
    590                         job_config=job_config).result()
    591                 except self.http_error as ex:
--> 592                     self.process_http_error(ex)
    593 
    594                 rows = []

~/anaconda3/envs/env/lib/python3.6/site-packages/pandas_gbq/gbq.py in process_http_error(ex)
    454         # <https://cloud.google.com/bigquery/troubleshooting-errors>`__
    455 
--> 456         raise GenericGBQException("Reason: {0}".format(ex))
    457 
    458     def run_query(self, query, **kwargs):

GenericGBQException: Reason: 400 Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1.
like image 986
Yali Pollak Avatar asked Jan 10 '18 15:01

Yali Pollak


Video Answer


1 Answers

I've had the very same problem.

In my case it depended on the data type object of the Data Frame.

I've had three columns externalId, mappingId, info. For none of those fields I set a data type and let pandas do it's magic.

It decided to set all three column data types to object. Problem is, internally the to_gbq component uses the to_json component. For some reason or another this output omits the quotes around the data field if the type of the field is object but holds only numerical values.

So Google Big Query needed this

{"externalId": "12345", "mappingId":"abc123", "info":"blerb"}

but got this:

{"externalId": 12345, "mappingId":"abc123", "info":"blerb"}

And because the mapping of the field was STRING in Google Big Query, the import process failed.

Two solutions came up.

Solution 1 - Change the data type of the column

A simple type conversion helped with this issue. I also had to change the data type in Big Query to INTEGER.

df['externalId'] = df['externalId'].astype('int')

If this is the case, Big Query can consume fields without quotes as the JSON standard says.

Solution 2 - Make sure the string field is a string

Again, this is setting the data type. But since we set it explicitly to String, the export with to_json prints out a quoted field and everything worked fine.

df['externalId'] = df['externalId'].astype('str')
like image 148
tobi6 Avatar answered Nov 10 '22 20:11

tobi6