Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BigQuery User Defined Aggregation Function?

I know I can define a User Defined Function in order to perform some custom calculation. I also know I can use the 'out-of-the-box' aggregation functions to reduce a collection of values to a single value when using a GROUP BY clause.

Is it possible to define a custom user-defined, Aggregation Function to use with a GROUP BY clause?

like image 629
Stewart_R Avatar asked May 15 '17 08:05

Stewart_R


People also ask

Can we create user-defined functions in BigQuery?

A user-defined function (UDF) lets you create a function by using a SQL expression or JavaScript code. A UDF accepts columns of input, performs actions on the input, and returns the result of those actions as a value. You can define a UDFs as either persistent or temporary.

What are the two parameter of user-defined function in GCP?

The UDF has two formal parameters: row : an input row. emit : a hook used by BigQuery to collect output data. The emit function takes one parameter: a JavaScript object that represents a single row of output data.

What is array AGG in BigQuery?

ARRAY_AGG. Returns an ARRAY of expression values. To learn more about the optional arguments in this function and how to use them, see Aggregate function calls. To learn more about the OVER clause and how to use it, see Window function calls.

What are user-defined functions UDFs What are the elements of UDFs?

A user-defined function has three main components that are function declarations, function definition and function call. Further functions can be called by call by value or call by reference.


1 Answers

Turns out that this IS possible (as long as the groups we seek to aggregate are of a reasonable size in memory) with a little bit of 'glue' - namely the ARRAY_AGG function

The steps are as follows:

  1. Create a UDF with an input parameter of type ARRAY<T> where T is the type of value you want to aggregate.
  2. Use the ARRAY_AGG function in the query with the GROUP BY clause to generate an array of T and pass into your UDF.

As a concrete example:

CREATE TEMP FUNCTION aggregate_fruits(fruits ARRAY<STRING>)
RETURNS STRING
LANGUAGE js AS """
return "my fruit bag contains these items: " + fruits.join(",");
""";

WITH fruits AS
(SELECT "apple" AS fruit
UNION ALL SELECT "pear" AS fruit
UNION ALL SELECT "banana" AS fruit)

SELECT aggregate_fruits(ARRAY_AGG(fruit))
FROM fruits
like image 59
Stewart_R Avatar answered Oct 03 '22 07:10

Stewart_R