I'm trying to query data from a MySQL server and write it to Google BigQuery using pandas .to_gbq api.
def production_to_gbq(table_name_prod,prefix,table_name_gbq,dataset,project):
# Extract data from Production
q = """
SELECT *
FROM
{}
""".format(table_name_prod)
df = pd.read_sql(q, con)
# Write to gbq
df.to_gbq(dataset + table_name_gbq, project, chunksize=1000, verbose=True, reauth=False, if_exists='replace', private_key=None)
return df
I keep getting a 400 error indicating invalid input.
Load is 100.0% Complete
---------------------------------------------------------------------------
BadRequest Traceback (most recent call last)
/usr/local/lib/python3.6/site-packages/pandas_gbq/gbq.py in load_data(self, dataframe, dataset_id, table_id, chunksize, schema)
569 self.client, dataframe, dataset_id, table_id,
--> 570 chunksize=chunksize):
571 self._print("\rLoad is {0}% Complete".format(
/usr/local/lib/python3.6/site-packages/pandas_gbq/_load.py in load_chunks(client, dataframe, dataset_id, table_id, chunksize, schema)
73 destination_table,
---> 74 job_config=job_config).result()
/usr/local/lib/python3.6/site-packages/google/cloud/bigquery/job.py in result(self, timeout)
527 # TODO: modify PollingFuture so it can pass a retry argument to done().
--> 528 return super(_AsyncJob, self).result(timeout=timeout)
529
/usr/local/lib/python3.6/site-packages/google/api_core/future/polling.py in result(self, timeout)
110 # Pylint doesn't recognize that this is valid in this case.
--> 111 raise self._exception
112
BadRequest: 400 Error while reading data, error message: CSV table encountered too many errors, giving up. Rows: 10; errors: 1. Please look into the error stream for more details.
During handling of the above exception, another exception occurred:
GenericGBQException Traceback (most recent call last)
<ipython-input-73-ef9c7cec0104> in <module>()
----> 1 departments.to_gbq(dataset + table_name_gbq, project, chunksize=1000, verbose=True, reauth=False, if_exists='replace', private_key=None)
2
/usr/local/lib/python3.6/site-packages/pandas/core/frame.py in to_gbq(self, destination_table, project_id, chunksize, verbose, reauth, if_exists, private_key)
1058 return gbq.to_gbq(self, destination_table, project_id=project_id,
1059 chunksize=chunksize, verbose=verbose, reauth=reauth,
-> 1060 if_exists=if_exists, private_key=private_key)
1061
1062 @classmethod
/usr/local/lib/python3.6/site-packages/pandas/io/gbq.py in to_gbq(dataframe, destination_table, project_id, chunksize, verbose, reauth, if_exists, private_key)
107 chunksize=chunksize,
108 verbose=verbose, reauth=reauth,
--> 109 if_exists=if_exists, private_key=private_key)
/usr/local/lib/python3.6/site-packages/pandas_gbq/gbq.py in to_gbq(dataframe, destination_table, project_id, chunksize, verbose, reauth, if_exists, private_key, auth_local_webserver, table_schema)
980 connector.load_data(
981 dataframe, dataset_id, table_id, chunksize=chunksize,
--> 982 schema=table_schema)
983
984
/usr/local/lib/python3.6/site-packages/pandas_gbq/gbq.py in load_data(self, dataframe, dataset_id, table_id, chunksize, schema)
572 ((total_rows - remaining_rows) * 100) / total_rows))
573 except self.http_error as ex:
--> 574 self.process_http_error(ex)
575
576 self._print("\n")
/usr/local/lib/python3.6/site-packages/pandas_gbq/gbq.py in process_http_error(ex)
453 # <https://cloud.google.com/bigquery/troubleshooting-errors>`__
454
--> 455 raise GenericGBQException("Reason: {0}".format(ex))
456
457 def run_query(self, query, **kwargs):
GenericGBQException: Reason: 400 Error while reading data, error message: CSV table encountered too many errors, giving up. Rows: 10; errors: 1. Please look into the error stream for more details.
I've investigated the table schema,
id INTEGER NULLABLE
name STRING NULLABLE
description STRING NULLABLE
created_at INTEGER NULLABLE
modified_at FLOAT NULLABLE
and it is the same as the dataframe:
id int64
name object
description object
created_at int64
modified_at float64
The table is created in GBQ but remains empty.
I read around a little but am not finding much on pandas.to_gbq api, except for this which seemed relevant but has no reply:
bigquery table is empty when using pandas to_gbq
I found one potential solution about numbers in the object datatypes which are being passed into the GBQ table without quotes, something fixed by setting column datatypes as string.
I use to_gbq on pandas for updating Google BigQuery and get GenericGBQException
I tried the fix:
for col in df.columns:
if df[col].dtypes == object:
df[col] = df[col].fillna('')
df[col] = df[col].astype(str)
Unfortunately I still get the same error. Similarly trying to format missing data and setting dtypes for int and float also gives the same error.
Is there something I'm missing?
Found that bigquery cannot properly handle \r (sometimes \n too) Had the same issue, localized the problem and I was really surprised when just replacing \r with space fixed it:
for col in list(df.columns):
df[col] = df[col].apply(lambda x: x.replace(u'\r', u' ') if isinstance(x, str) or isinstance(x, unicode) else x)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With