I have some code that automatically generates a bunch of different SQL queries that I would like to insert into the bigquery to generate views, though one of the issues that I have is that these views need to be generated dynamically every night because of the changing nature of the data. So what I would like to be able to do is use the google bigquery api for python to be able to make a view. I understand how to do it using the 'bq' command line tool, but I'd like to be able to have this built directly into the code as opposed to using a shell to run bq. I have played with the code provided at
https://cloud.google.com/bigquery/bigquery-api-quickstart
I don't understand how to use this bit of code to create a view instead of just returning the results of a SELECT statement. I can see the documentation about doing table inserts here
https://cloud.google.com/bigquery/docs/reference/v2/tables/insert
but that refers to using the REST API to generate new tables as opposed to the example provided above.
Is it just not possible? Should I just give in and use bq?
Thanks
*** Some additional questions in response to Felipe's comments.
The table resource document indicates that there are a number of required fields, some of which make sense even if I don't fully understand what they're asking for, others do not. For example, externalDataConfiguration.schema. Does this refer to the schema for the database that I'm connecting to (I assume it does), or the schema for storing the data?
What about externalDataConfiguration.sourceFormat? Since I'm trying to make a view of a pre-existing database, I'm not sure I understand how the source format is relevant. Is it the source format of the database I'm making a view from? How would I identify that?
ANd externalDataConfiguration.sourceUris[], I'm not importing new data into the database, so I don't understand how this (or the previous element) are required.
What about schema?
tableReference.datasetId, tableReference.projectId, and tableReference.tableId are self explanatory.
Type would be view, and view.query would be the actual sql query used to make the view. So I get why those are required for making a view, but I don't understand the other parts.
Can you help me understand these details?
Thanks, Brad
The BigQuery client library for Python is automatically installed in a managed notebook. Behind the scenes, the %%bigquery magic command uses the BigQuery client library for Python to run the given query, convert the results to a pandas DataFrame, optionally save the results to a variable, and then display the results.
Using https://cloud.google.com/bigquery/docs/reference/rest/v2/tables/insert
Submit something like below, assuming you add the authorization
{
"view": {
"query": "select column1, count(1) `project.dataset.someTable` group by 1",
"useLegacySql": false
},
"tableReference": {
"tableId": "viewName",
"projectId": "projectName",
"datasetId": "datasetName"
}
}
Alternatively in Python using, assuming you have a service key setup and the environmental variable GOOGLE_APPLICATION_CREDENTIALS=/path/to/my/key. The one caveat is that as far as I can tell this can only create views using legacy sql, and as an extension can only be queried using legacy sql, though the straight API method allows legacy or standard.
from google.cloud import bigquery
def create_view(dataset_name, view_name, project, viewSQL):
bigquery_client = bigquery.Client(project=project)
dataset = bigquery_client.dataset(dataset_name)
table = dataset.table(view_name)
table.view_query = viewSQL
try:
table.create()
return True
except Exception as err:
print(err)
return False
Note: this changed a little bit with 0.28.0 of the library - see the following for further details: Google BigQuery: creating a view via Python google-cloud-bigquery version 0.27.0 vs. 0.28.0
my example function
# create a view via python
def create_view(dataset_name, view_name, sqlQuery, project=None):
try:
bigquery_client = bigquery.Client(project=project)
dataset_ref = bigquery_client.dataset(dataset_name)
table_ref = dataset_ref.table(view_name)
table = Table(table_ref)
table.view_query = sqlQuery
table.view_use_legacy_sql = False
bigquery_client.create_table(table)
return True
except Exception as e:
errorStr = 'ERROR (create_view): ' + str(e)
print(errorStr)
raise
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With