Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I use the BigQuery REST API from the command line?

Attempting to make a plain GET request to one of the BigQuery REST APIs gives an error that looks like this:

curl https://www.googleapis.com/bigquery/v2/projects/$PROJECT_ID/jobs/$JOBID

Output:

{
 "error": {
  "errors": [
   {
    "domain": "global",
    "reason": "required",
    "message": "Login Required",
    "locationType": "header",
    "location": "Authorization",
  ...

What is the correct way to invoke one of the REST APIs from the command-line, such as the query or insert APIs? The API reference has a "Try this API", but the examples don't translate directly to something you can run from the command-line.

like image 778
Elliott Brossard Avatar asked Jul 14 '17 16:07

Elliott Brossard


Video Answer


1 Answers

As a disclaimer, when working from the command-line, using the bq tool will usually be sufficient, or for more complex use cases, the BigQuery client libraries enable programming with BigQuery from multiple languages. It can still be useful sometimes to make plain requests to the REST APIs to see how certain APIs work at a low level, however.

First, make sure that you have installed the Google Cloud SDK. This should include the gcloud and bq command-line tools. If you haven't already, authorize your account by running this command from your terminal:

gcloud auth login

This should prompt you to log in and then give you an access code that you can paste into your terminal. (The exact process may change over time).

Now let's try a query using the BigQuery REST API, calling the jobs.query method. Modify this script with your own project name, which you can find from the Google Cloud Console, then paste the script into your terminal:

PROJECT="YOUR_PROJECT_NAME"
QUERY="\"SELECT 1 AS x, 'foo' AS y;\""
REQUEST="{\"kind\":\"bigquery#queryRequest\",\"useLegacySql\":false,\"query\":$QUERY}"
echo $REQUEST | \
  curl -X POST -d @- -H "Content-Type: application/json" \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    https://www.googleapis.com/bigquery/v2/projects/$PROJECT/queries

If it worked, you should see output that looks like this:

{
 "kind": "bigquery#queryResponse",
 "schema": {
  "fields": [
   {
    "name": "x",
    "type": "INTEGER",
    "mode": "NULLABLE"
   },
   {
    "name": "y",
    "type": "STRING",
    "mode": "NULLABLE"
   }
  ]
 },
 "jobReference": {
  "projectId": "<your project ID>",
  "jobId": "<your job ID>"
 },
 "totalRows": "1",
 "rows": [
  {
   "f": [
    {
     "v": "1"
    },
    {
     "v": "foo"
    }
   ]
  }
 ],
 "totalBytesProcessed": "0",
 "jobComplete": true,
 "cacheHit": false
}

If you haven't set up the bq command-line tool, you can use bq init from your terminal to do so. Once you have, you can try running the same query using it:

bq query --use_legacy_sql=False "SELECT 1 AS x, 'foo' AS y;"

You can also see the REST API requests that the bq tool makes by passing the --apilog= option:

bq --apilog= query --use_legacy_sql=False "SELECT [1, 2, 3] AS x;"

Now let's try an example using the jobs.insert method instead of the query API. Run this script, replacing YOUR_PROJECT_NAME with your project name:

PROJECT="YOUR_PROJECT_NAME"
QUERY="\"SELECT 1 AS x, 'foo' AS y;\""
REQUEST="{\"configuration\":{\"query\":{\"useLegacySql\":false,\"query\":${QUERY}}}}"
echo $REQUEST | \
curl -X POST -d @- -H "Content-Type: application/json" \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    https://www.googleapis.com/bigquery/v2/projects/$PROJECT/jobs

Unlike the query API, which returned a response immediately, you will see a result that looks similar to this:

{
 "kind": "bigquery#job",
 "etag": "\"<etag string>\"",
 "id": "<project name>:<job ID>",
 "selfLink": "https://www.googleapis.com/bigquery/v2/projects/<project name>/jobs/<job ID>",
 "jobReference": {
  "projectId": "<project name>",
  "jobId": "<job ID>"
 },
 "configuration": {
  "query": {
   "query": "SELECT 1 AS x, 'foo' AS y;",
   "destinationTable": {
    "projectId": "<project name>",
    "datasetId": "<anonymous dataset>",
    "tableId": "<anonymous table>"
   },
   "createDisposition": "CREATE_IF_NEEDED",
   "writeDisposition": "WRITE_TRUNCATE",
   "useLegacySql": false
  }
 },
 "status": {
  "state": "RUNNING"
 },
 "statistics": {
  "creationTime": "<timestamp millis>",
  "startTime": "<timestamp millis>"
 },
 "user_email": "<your email address>"
}

Notice the status:

 "status": {
  "state": "RUNNING"
 },

If you want to check on the job now, you can use the jobs.get method. Similar to before, run this from your terminal, using the job ID from the output in the previous step:

PROJECT="YOUR_PROJECT_NAME"
JOB_ID="YOUR_JOB_ID"
curl -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  https://www.googleapis.com/bigquery/v2/projects/$PROJECT/jobs/$JOB_ID

If the query is done, you'll get a response that indicates as much:

...
"status": {
 "state": "DONE"
},
...

Finally, we can make a request to fetch the query results, also using the REST API.

curl -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  https://www.googleapis.com/bigquery/v2/projects/$PROJECT/queries/$JOB_ID

The output will look similar to when we used the jobs.query method above:

{
 "kind": "bigquery#getQueryResultsResponse",
 "etag": "\"<etag string>\"",
 "schema": {
  "fields": [
   {
    "name": "x",
    "type": "INTEGER",
    "mode": "NULLABLE"
   },
   {
    "name": "y",
    "type": "STRING",
    "mode": "NULLABLE"
   }
  ]
 },
 "jobReference": {
  "projectId": "<project ID>",
  "jobId": "<job ID>"
 },
 "totalRows": "1",
 "rows": [
  {
   "f": [
    {
     "v": "1"
    },
    {
     "v": "foo"
    }
   ]
  }
 ],
 "totalBytesProcessed": "0",
 "jobComplete": true,
 "cacheHit": true
}
like image 191
Elliott Brossard Avatar answered Oct 02 '22 05:10

Elliott Brossard