I am trying to append a table to a different table through pandas, pulling the data from BigQuery and sending it to a different BigQuery dataset. While the table schema is exactly the same i get the error " "Please verify that the structure and " pandas_gbq.gbq.InvalidSchema: Please verify that the structure and data types in the DataFrame match the schema of the destination table."
This error occurred earlier where I went for table overwrites but in this case the datasets are too large to do that (and that is not a sustainable solution).
df = pd.read_gbq(query, project_id="my-project", credentials=bigquery_key,
dialect='standard')
pd.io.gbq.to_gbq(df, dataset, projectid,
if_exists='append',
table_schema=[{'name': 'Date','type': 'STRING'},
{'name': 'profileId','type': 'STRING'},
{'name': 'Opco','type': 'STRING'},
{'name': 'country','type': 'STRING'},
{'name': 'deviceType','type': 'STRING'},
{'name': 'userType','type': 'STRING'},
{'name': 'users','type': 'INTEGER'},
{'name': 'sessions','type': 'INTEGER'},
{'name': 'bounceRate','type': 'FLOAT'},
{'name': 'sessionsPerUser','type': 'FLOAT'},
{'name': 'avgSessionDuration','type': 'FLOAT'},
{'name': 'pageviewsPerSession','type': 'FLOAT'}
],
credentials=bigquery_key)
The schema in BigQuery is as follows:
Date STRING
profileId STRING
Opco STRING
country STRING
deviceType STRING
userType STRING
users INTEGER
sessions INTEGER
bounceRate FLOAT
sessionsPerUser FLOAT
avgSessionDuration FLOAT
pageviewsPerSession FLOAT
I then get the following error:
Traceback (most recent call last): File "..file.py", line 63, in
<module>
main()
File "..file.py", line 57, in main
updating_general_data(bigquery_key)
File "..file.py", line 46, in updating_general_data
credentials=bigquery_key)
File
"..\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\io\gbq.py",
line 162, in to_gbq
credentials=credentials, verbose=verbose, private_key=private_key)
File
"..\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas_gbq\gbq.py",
line 1141, in to_gbq
"Please verify that the structure and " pandas_gbq.gbq.InvalidSchema: Please verify that the structure and
data types in the DataFrame match the schema of the destination table.
To me it seems that there is a 1 on 1 match. I've seen other threads talk about this and these threads are mainly talking about date formats even though the date format is already a string in this case and is then with the table_schema still made as string.
Ultimate solution to this is instead of manually specifying schema which can always be prone to type casting/naming errors, Its always best top get the schema from the very table. So have a client using latest version of API:
from google.cloud import bigquery
from google.oauth2 import service_account
credentials = service_account.Credentials.from_service_account_file(
'credentials.json')
project_id = 'your_project_id',
client = bigquery.Client(credentials= credentials,project=project_id)
Get the table you want to write/append to:
table = client.get_table('your_dataset.your_table')
table
Generate schema from the table:
generated_schema = [{'name':i.name, 'type':i.field_type} for i in table.schema]
generated_schema
Rename your dataframe accordingly:
data.columns = [i.name for i in table.schema]
Pass the same schema while pushing it to BigQuery:
data.to_gbq(project_id = 'your_project_id',
destination_table = 'your_dataset.your_table',
credentials = service_account.Credentials.from_service_account_file(
'credentials.json'),
table_schema = generated_schema,
progress_bar = True,
if_exists = 'replace')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With