Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

JS UDF that returns a struct for Standard SQL / BigQuery and creates two columns

I’m trying to write a user defined function for BigQuery using Javascript that returns a struct and generates two columns:

CREATE TEMP FUNCTION exampleFunction(exampleString STRING)
  RETURNS STRUCT<index INT64, latency INT64> LANGUAGE js AS
  LANGUAGE js AS 
"""
    var exampleStruct = {1:100, 2:200}
    return exampleStruct;
""";

My query would be something like this:

SELECT
exampleCol,
exampleFunction(stringCol) -- use SELECT AS STRUCT somewhere here? with the aliases “First” and “Second”
FROM
[SOME DATATBASE HERE]

I want the output of exampleFunction(stringCol) to generate two columns (three columns total if we include exampleCol). For example, if exampleCol gave us “SOMETHING”, I’d want to return the columns: “SOMETHING” for exampleCol, 1 for “First”, and 2 for “Second”. It this something that is possible?

I’m having trouble returning a STRUCT from the JS function (unsure if my syntax is off) and getting the query right. For the query, I want to avoid running the JavaScript function twice. Thanks!


2 Answers

I'd like to add to Mikhail Berlyant's answer, which works fine in this situation, but I've run into a problem in slightly different situation.

Instead of using "this" in JavaScript which retains data across rows, I propose using a new object to do it.

In my example, I want to return one more column, with this column value based on value of another existing column. I'll add one more column named "latencyUnder150" which will have value of 'Y' if latency field has value lower than 150, otherwise just leave the field blank.

#standardSQL
CREATE TEMP FUNCTION exampleFunction(exampleString STRING)
  RETURNS STRUCT<index INT64, latency INT64, latencyUnder150 STRING> 
  LANGUAGE js AS 
"""
    arr = exampleString.split(':');
    this.index = arr[0];
    this.latency = arr[1];
    if (this.latency < 150) {
        this.latencyUnder150 = 'Y'
    }
    return this;
""";
WITH `project.dataset.table` AS (
  SELECT 1 exampleCol, '10:100' stringCol UNION ALL
  SELECT 2, '20:200' UNION ALL
  SELECT 3, '30:456'
)
SELECT exampleCol, exampleFunction(stringCol).*
FROM `project.dataset.table`
-- ORDER BY exampleCol   

Using "this" variable in JS, you'll get this answer.

| Row | exampleCol | index | latency | latencyUnder150 |
|-----|------------|-------|---------|-----------------|
| 1   | 1          | 10    | 100     | Y               |
| 2   | 2          | 20    | 200     | Y               |
| 3   | 3          | 30    | 456     | Y               |

You can see that the field latencyUnder150 retains value of "Y" from the first record.

By slightly changing code to use a new object, each row starts without value from previous row.

#standardSQL
CREATE TEMP FUNCTION exampleFunction(exampleString STRING)
  RETURNS STRUCT<index INT64, latency INT64, latencyUnder150 STRING> 
  LANGUAGE js AS 
"""
    var outObj = {}
    arr = exampleString.split(':');
    outObj.index = arr[0];
    outObj.latency = arr[1];
    if (outObj.latency < 150) {
        outObj.latencyUnder150 = 'Y'
    }
    return outObj;
""";
WITH `project.dataset.table` AS (
  SELECT 1 exampleCol, '10:100' stringCol UNION ALL
  SELECT 2, '20:200' UNION ALL
  SELECT 3, '30:456'
)
SELECT exampleCol, exampleFunction(stringCol).*
FROM `project.dataset.table`
-- ORDER BY exampleCol   
| Row | exampleCol | index | latency | latencyUnder150 |
|-----|------------|-------|---------|-----------------|
| 1   | 1          | 10    | 100     | Y               |
| 2   | 2          | 20    | 200     | (null)          |
| 3   | 3          | 30    | 456     | (null)          |
like image 112
krissdap Avatar answered Dec 14 '25 14:12

krissdap


Below example is for BigQuery Standard SQL

#standardSQL
CREATE TEMP FUNCTION exampleFunction(exampleString STRING)
  RETURNS STRUCT<index INT64, latency INT64> 
  LANGUAGE js AS 
"""
    arr = exampleString.split(':');
    this.index = arr[0];
    this.latency = arr[1];
    return this;
""";
WITH `project.dataset.table` AS (
  SELECT 1 exampleCol, '10:100' stringCol UNION ALL
  SELECT 2, '20:200' UNION ALL
  SELECT 3, '30:456'
)
SELECT exampleCol, exampleFunction(stringCol).*
FROM `project.dataset.table`
-- ORDER BY exampleCol   

with result

Row exampleCol  index   latency  
1   1           10      100  
2   2           20      200  
3   3           30      456   

Note: if you want columns to be aliased with First, Second - you can either replace index and latency with respectively first, second like in below example

#standardSQL
CREATE TEMP FUNCTION exampleFunction(exampleString STRING)
  RETURNS STRUCT<first INT64, second INT64> 
  LANGUAGE js AS 
"""
    arr = exampleString.split(':');
    this.first = arr[0];
    this.second = arr[1];
    return this;
""";
SELECT exampleCol, exampleFunction(stringCol).*
FROM `project.dataset.table`  

or you can use below approach

#standardSQL
CREATE TEMP FUNCTION exampleFunction(exampleString STRING)
  RETURNS STRUCT<index INT64, latency INT64> 
  LANGUAGE js AS 
"""
    arr = exampleString.split(':');
    this.index = arr[0];
    this.latency = arr[1];
    return this;
""";
SELECT exampleCol, index AS first, latency AS second   
FROM (
  SELECT exampleCol, exampleFunction(stringCol).*
  FROM `project.dataset.table`
)

with below result in both cases

Row exampleCol  first   second   
1   1           10      100  
2   2           20      200  
3   3           30      456  
like image 44
Mikhail Berlyant Avatar answered Dec 14 '25 15:12

Mikhail Berlyant



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!