Is it possible to invoke BigQuery procedures in python client?

Question

Scripting/procedures for BigQuery just came out in beta - is it possible to invoke procedures using the BigQuery python client?

I tried:

query = """CALL `myproject.dataset.procedure`()...."""
job = client.query(query, location="US",)
print(job.results())
print(job.ddl_operation_performed)

print(job._properties) but that didn't give me the result set from the procedure. Is it possible to get the results?

Thank you!

Edited - stored procedure I am calling

CREATE OR REPLACE PROCEDURE `Project.Dataset.Table`(IN country STRING, IN accessDate DATE, IN accessId, OUT saleExists INT64)
BEGIN
  IF EXISTS (SELECT 1 FROM dataset.table where purchaseCountry = country and purchaseDate=accessDate and customerId = accessId)
  THEN
  SET saleExists = (SELECT 1);
ELSE
  INSERT Dataset.MissingSalesTable (purchaseCountry, purchaseDate, customerId) VALUES (country, accessDate, accessId);
  SET saleExists = (SELECT 0);
END IF;
END;

Tim Swast · Accepted Answer

If you follow the CALL command with a SELECT statement, you can get the return value of the function as a result set. For example, I created the following stored procedure:

BEGIN
  -- Build an array of the top 100 names from the year 2017.
DECLARE
  top_names ARRAY<STRING>;
SET
  top_names = (
  SELECT
    ARRAY_AGG(name
    ORDER BY
      number DESC
    LIMIT
      100)
  FROM
    `bigquery-public-data.usa_names.usa_1910_current`
  WHERE
    year = 2017 );
  -- Which names appear as words in Shakespeare's plays?
SET
  top_shakespeare_names = (
  SELECT
    ARRAY_AGG(name)
  FROM
    UNNEST(top_names) AS name
  WHERE
    name IN (
    SELECT
      word
    FROM
      `bigquery-public-data.samples.shakespeare` ));
END

Running the following query will return the procedure's return as the top-level results set.

DECLARE top_shakespeare_names ARRAY<STRING> DEFAULT NULL;
CALL `my-project.test_dataset.top_names`(top_shakespeare_names);
SELECT top_shakespeare_names;

In Python:

from google.cloud import bigquery

client = bigquery.Client()
query_string = """
DECLARE top_shakespeare_names ARRAY<STRING> DEFAULT NULL;
CALL `swast-scratch.test_dataset.top_names`(top_shakespeare_names);
SELECT top_shakespeare_names;
"""
query_job = client.query(query_string)
rows = list(query_job.result())
print(rows)

Related: If you have SELECT statements within a stored procedure, you can walk the job to fetch the results, even if the SELECT statement isn't the last statement in the procedure.

# TODO(developer): Import the client library.
# from google.cloud import bigquery

# TODO(developer): Construct a BigQuery client object.
# client = bigquery.Client()

# Run a SQL script.
sql_script = """
-- Declare a variable to hold names as an array.
DECLARE top_names ARRAY<STRING>;

-- Build an array of the top 100 names from the year 2017.
SET top_names = (
SELECT ARRAY_AGG(name ORDER BY number DESC LIMIT 100)
FROM `bigquery-public-data.usa_names.usa_1910_2013`
WHERE year = 2000
);

-- Which names appear as words in Shakespeare's plays?
SELECT
name AS shakespeare_name
FROM UNNEST(top_names) AS name
WHERE name IN (
SELECT word
FROM `bigquery-public-data.samples.shakespeare`
);
"""
parent_job = client.query(sql_script)

# Wait for the whole script to finish.
rows_iterable = parent_job.result()
print("Script created {} child jobs.".format(parent_job.num_child_jobs))

# Fetch result rows for the final sub-job in the script.
rows = list(rows_iterable)
print("{} of the top 100 names from year 2000 also appear in Shakespeare's works.".format(len(rows)))

# Fetch jobs created by the SQL script.
child_jobs_iterable = client.list_jobs(parent_job=parent_job)
for child_job in child_jobs_iterable:
    child_rows = list(child_job.result())
    print("Child job with ID {} produced {} rows.".format(child_job.job_id, len(child_rows)))

Yun Zhang · Answer

It works if you have SELECT inside your procedure, given the procedure being:

create or replace procedure dataset.proc_output() BEGIN
  SELECT t FROM UNNEST(['1','2','3']) t;
END;

Code:

from google.cloud import bigquery
client = bigquery.Client()
query = """CALL dataset.proc_output()"""
job = client.query(query, location="US")
for result in job.result():
        print result

will output:

Row((u'1',), {u't': 0})
Row((u'2',), {u't': 0})
Row((u'3',), {u't': 0})

However, if there are multiple SELECT inside a procedure, only the last result set can be fetched this way.

Update

See below example:

CREATE OR REPLACE PROCEDURE zyun.exists(IN country STRING, IN accessDate DATE, OUT saleExists INT64)
BEGIN
  SET saleExists = (WITH data AS (SELECT "US" purchaseCountry, DATE "2019-1-1" purchaseDate)
    SELECT Count(*) FROM data where purchaseCountry = country and purchaseDate=accessDate);
  IF saleExists = 0  THEN
    INSERT Dataset.MissingSalesTable (purchaseCountry, purchaseDate, customerId) VALUES (country, accessDate, accessId);
  END IF;
END;
BEGIN
  DECLARE saleExists INT64;
  CALL zyun.exists("US", DATE "2019-2-1", saleExists);
  SELECT saleExists;
END

BTW, your example is much better served with a single MERGE statement instead of a script.

Is it possible to invoke BigQuery procedures in python client?

Tags:

google-api-python-client

google-bigquery

WIT

2 Answers

Tim Swast

Yun Zhang

Recent Activity

Donate For Us

Is it possible to invoke BigQuery procedures in python client?

Tags:

google-api-python-client

google-bigquery

WIT

2 Answers

Tim Swast

Yun Zhang

Related questions

Recent Activity

Donate For Us