Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

List Scheduled Queries in BigQuery

I have a need to (programatically) analyse the details of Scheduled Queries in BigQuery (eg which tables are updated and which tables accessed in the SQL). I have done something similar for the BQ Tables/Views using Apps Script BigQuery.Tables.list(), but I cannot find an API to access the Scheduled Queries.

The UI is able to list them, so I feel this should be possible programmatically, e.g. via a REST API. Does anyone know if this is possible, what interface is supported (Apps Script, REST ...), and possibly an example of how to use it.

like image 275
Olde Ted Avatar asked Apr 18 '19 12:04

Olde Ted


2 Answers

Scheduled queries are part of BigQuery's Data Transfer Service so you have to use its API. In particular, the projects.transferConfigs.list method. Fill in the dataSourceIds field with scheduled_query and parent with projects/PROJECT_ID. As discussed in the comments, if you are using a regional location such as europe-west2 instead of a multi-regional one (EU or US) you should use projects.locations.transferConfigs.list instead. Now, parent resource will be in the form of projects/PROJECT_ID/locations/REGIONAL_LOCATION.

In addition, for other transfers you can get the corresponding dataSourceIds using the projects.dataSources.list method. That's how I got the scheduled_query one.

Response will be an array of scheduled queries such as:

{
  "name": "projects/<PROJECT_NUMBER>/locations/us/transferConfigs/<TRANSFER_CONFIG_ID>",
  "destinationDatasetId": "<DATASET>",
  "displayName": "hacker-news",
  "updateTime": "2018-11-14T15:39:18.897911Z",
  "dataSourceId": "scheduled_query",
  "schedule": "every 24 hours",
  "nextRunTime": "2019-04-19T15:39:00Z",
  "params": {
    "write_disposition": "WRITE_APPEND",
    "query": "SELECT @run_time AS time,\n  title,\n  author,\n  text\nFROM `bigquery-public-data.hacker_news.stories`\nLIMIT\n  1000",
    "destination_table_name_template": "hacker_daily_news"
  },
  "state": "SUCCEEDED",
  "userId": "<USER_ID>",
  "datasetRegion": "us"
}

Example of an API call with bash and curl:

#!/bin/bash

# parameter(s)
location=europe-west2

authToken="$(gcloud auth print-access-token)"
projectId=$(gcloud config get-value project 2>\dev\null)

# API call
scheduled_queries=$(curl  -H "Authorization: Bearer $authToken" \
https://bigquerydatatransfer.googleapis.com/v1/projects/$projectId/locations/$location/transferConfigs?dataSourceIds=scheduled_query)

# pretty print results
echo $scheduled_queries | python -m json.tool
like image 174
Guillem Xercavins Avatar answered Oct 11 '22 04:10

Guillem Xercavins


The above answers are excellent responses for using the REST API. For completeness, I'd like to include the CLI commands approach for solving the same thing. Personally I find this better suited for shell scripts but YMMV.

Example: List of Schedule Queries from Default Project.

bq ls --transfer_config --transfer_location=US --format=prettyjson

Example: Details of a Schedule Query from Default Project.

bq show --format=prettyjson --transfer_config [RESOURCE_NAME]
# RESOURCE_NAME is a value you can get from the above bq ls command.

Further details can be found here.

like image 5
Mark Spiezio Avatar answered Oct 11 '22 04:10

Mark Spiezio