Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BigQuery - Check if table already exists

I have a dataset in BigQuery. This dataset contains multiple tables.

I am doing the following steps programmatically using the BigQuery API:

  1. Querying the tables in the dataset - Since my response is too large, I am enabling allowLargeResults parameter and diverting my response to a destination table.

  2. I am then exporting the data from the destination table to a GCS bucket.

Requirements:

  • Suppose my process fails at Step 2, I would like to re-run this step.

  • But before I re-run, I would like to check/verify that the specific destination table named 'xyz' already exists in the dataset.

  • If it exists, I would like to re-run step 2.

  • If it does not exist, I would like to do foo.

How can I do this?

Thanks in advance.

like image 773
activelearner Avatar asked Feb 25 '15 22:02

activelearner


People also ask

How do I check if a table exists in a large query?

// tableExists checks wheter a table exists on a given dataset.

How do I recover a deleted table in BigQuery?

In the Google Cloud console, go to the BigQuery page. In the Explorer pane, expand the project and dataset nodes of the table snapshot you want to restore from. Click the name of the table snapshot. In the table snapshot pane that appears, click Restore.

How do I delete a table in BigQuery?

$dataset = $bigQuery->dataset($datasetId); $table = $dataset->table($tableId); $table->delete();


2 Answers

Alex F's solution works on v0.27, but will not work on later versions. In order to migrate to v0.28+, the below solution will work.

from google.cloud import bigquery

project_nm = 'gc_project_nm'
dataset_nm = 'ds_nm'
table_nm = 'tbl_nm'

client = bigquery.Client(project_nm)
dataset = client.dataset(dataset_nm)
table_ref = dataset.table(table_nm)

def if_tbl_exists(client, table_ref):
    from google.cloud.exceptions import NotFound
    try:
        client.get_table(table_ref)
        return True
    except NotFound:
        return False

if_tbl_exists(client, table_ref)
like image 121
tarheel Avatar answered Oct 17 '22 04:10

tarheel


Here is a python snippet that will tell whether a table exists (deleting it in the process--careful!):

def doesTableExist(project_id, dataset_id, table_id):
  bq.tables().delete(
      projectId=project_id, 
      datasetId=dataset_id,
      tableId=table_id).execute()
  return False

Alternately, if you'd prefer not deleting the table in the process, you could try:

def doesTableExist(project_id, dataset_id, table_id):
  try:
    bq.tables().get(
        projectId=project_id, 
        datasetId=dataset_id,
        tableId=table_id).execute()
    return True
  except HttpError, err
    if err.resp.status <> 404:
       raise
    return False

If you want to know where bq came from, you can call build_bq_client from here: http://code.google.com/p/bigquery-e2e/source/browse/samples/ch12/auth.py

In general, if you're using this to test whether you should run a job that will modify the table, it can be a good idea to just do the job anyway, and use WRITE_TRUNCATE as a write disposition.

Another approach can be to create a predictable job id, and retry the job with that id. If the job already exists, the job already ran (you might want to double check to make sure the job didn't fail, however).

like image 14
Jordan Tigani Avatar answered Oct 17 '22 04:10

Jordan Tigani