I know that in Oracle it's possible to create custom aggregate functions that process a collection of values and return a single result. Edit: I've even read the friendly manual at docs.oracle.com/cd/B28359_01/appdev.111/b28425/aggr_functions.htm !
I also know that Oracle provides built-in analytic functions like DENSE_RANK
and RATIO_TO_REPORT
, which provide values for each input, relative to a collection/window of values that input lies within.
What I want to know is if there's a way to create my own analytic function, presumably in a similar manner as I can create my own aggregation function, and in particular create one with additional arguments in my custom analytic function.
When I refer to an "analytic function", please read it as a function which, in addition to accepting windowing parameters via the PARTITION
keyword, also can return different values within a given window. (If anyone has a better term for this, please let me know! Pure analytic function? DENSE_RANK
-class analytic function? Non-aggregate analytic function?)
The Oracle documentation notes that an aggregate function can be used as an analytic (windowed) function. Unfortunately, this only means that the PARTITION
keyword for specifying windowing in analytic functions can be applied to aggregate functions as well. It doesn't promote the aggregate function to my coveted status of being able to return different values within a fixed window.
SELECT SUM(income) OVER (PARTITION BY first_initial) AS total FROM data;
will have as many records as data
, but it will only have as many distinct total
s as there are first initials.
SELECT RATIO_TO_REPORT(income) OVER (PARTITION BY first_initial) AS ratio FROM data;
will have as many records as data
, AND, even within a given first_initial
partition, those ratio
s may all be distinct.
I have been provided with call-only access to a PL/SQL procedure which accepts a numeric collection as an IN OUT parameter, and which has a few other IN configuration parameters. The procedure modifies the collection's values (think of it like "The University's Proprietary Sanctioned and Required Grade Curving Procedure") in a manner influenced by the configuration parameters.
Currently, the process to use the procedure is to hardcode a cursor loop that detects a change from one data partition to another, and then within each partition fetches data into a collection which is then passed to the procedure, altered, and eventually dumped back out into a separate table. I planned to improve this by making a PIPELINED
PARALLEL_ENABLE
table function that encapsulates some of the logic, but I'd much prefer to enable queries like the following:
SELECT G.Course_ID
, G.Student_ID
, G.Raw_Grade
, analytic_wrapper(G.raw_grade, P.course_config_data)
OVER (PARTITION BY G.Course_ID) AS Adjusted_Grade
, P.course_config_data
FROM grades G
LEFT JOIN policies P
ON G.Course_ID = P.Course_ID;
That requires being able to create a custom analytic function, though, and due to the way the procedure requires different inputs on different partitions (e.g. the Course_ID
-specific P.course_config_data
above) it also has to accept not only the data-to-be-aggregated argument, but also additional inputs.
Is this possible, and if so where could I find documentation? My Google-fu has failed me.
The PL/SQL procedure I've been provided with is (effectively) non-deterministic, and its output has statistical properties which must be preserved. For example, if A={A[0], A[1], A[3]}
are the raw grades for one particular class, and B=f(A)
is the result of invoking the procedure on A
at 1:00 while C=f(A)
is the result of invoking the procedure on A
at 1:15, then B={B[0],B[1],B[2]}
and C={C[0],C[1],C[2]}
are both acceptable outputs to use, but a mixture of the elements like {C[0],B[1],C[2]}
is not acceptable.
The upshot of this is that the procedure must be called exactly once on each partition. (Well, technically, it can be wastefully called as many times as one wants, but all results for a partition must come from the same invocation).
Supposed, for example, that the procedure I've been supplied with operates as follows: It accepts a collection of grades as an IN OUT
parameter, and then sets one of those grades, chosen at random, to 100. All the other grades are set to zero. Running this at 1pm might result in Alice having the only passing grade, while running it at 1:01pm might result in Bob having the only passing grade. Regardless, it should be the case that exactly one student per class passes, no more and no less.
This version does not suffer from the caveats of my previous answer, although it is going to be slower and a little harder to use. Most of the slowness from the loop in ODCIAggregateDelete -- you may be able to find an improvement there that doesn't require looping through the whole collection.
Anyway, this version makes a custom analytic function that mimics Oracle's native COLLECT function. So, instead of trying to create a custom analytic function that computes the actual value we want, it merely computes the set of rows in the window.
Then, for each row, we pass in the row data and the results of our custom "COLLECT" analytic to a regular function that computes the value we want.
Here's the code. (NOTE: your original question also asked about multiple parameters. Easy -- just put all the fields you want into matt_ratio_to_report_rec
.) (Also, sorry about the object names -- I prefix my name on everything is other developers know who to ask if the object is causing them problems.)
-- This is the input data to the analytic function
--DROP TYPE matt_ratio_to_report_rec;
CREATE OR REPLACE TYPE matt_ratio_to_report_rec AS OBJECT
( value NUMBER );
-- This is a collection of input data
--DROP TYPE matt_ratio_to_report_tab;
CREATE OR REPLACE TYPE matt_ratio_to_report_tab AS TABLE OF matt_ratio_to_report_rec;
-- This object type implements a custom analytic that acts as an analytic version of Oracle's COLLECT function
--DROP TYPE matt_ratio_to_report_col_impl;
CREATE OR REPLACE TYPE matt_ratio_to_report_col_impl AS OBJECT (
analytics_window matt_ratio_to_report_tab,
CONSTRUCTOR FUNCTION matt_ratio_to_report_col_impl(SELF IN OUT NOCOPY matt_ratio_to_report_col_impl ) RETURN SELF AS RESULT,
-- Called to initialize a new aggregation context
-- For analytic functions, the aggregation context of the *previous* window is passed in, so we only need to adjust as needed instead
-- of creating the new aggregation context from scratch
STATIC FUNCTION ODCIAggregateInitialize (sctx IN OUT matt_ratio_to_report_col_impl) RETURN NUMBER,
-- Called when a new data point is added to an aggregation context
MEMBER FUNCTION ODCIAggregateIterate (self IN OUT matt_ratio_to_report_col_impl, value IN matt_ratio_to_report_rec ) RETURN NUMBER,
-- Called to return the computed aggragate from an aggregation context
MEMBER FUNCTION ODCIAggregateTerminate (self IN matt_ratio_to_report_col_impl, returnValue OUT matt_ratio_to_report_tab, flags IN NUMBER) RETURN NUMBER,
-- Called to merge to two aggregation contexts into one (e.g., merging results of parallel slaves)
MEMBER FUNCTION ODCIAggregateMerge (self IN OUT matt_ratio_to_report_col_impl, ctx2 IN matt_ratio_to_report_col_impl) RETURN NUMBER,
-- ODCIAggregateDelete
MEMBER FUNCTION ODCIAggregateDelete(self IN OUT matt_ratio_to_report_col_impl, value matt_ratio_to_report_rec) RETURN NUMBER
);
CREATE OR REPLACE TYPE BODY matt_ratio_to_report_col_impl IS
CONSTRUCTOR FUNCTION matt_ratio_to_report_col_impl(SELF IN OUT NOCOPY matt_ratio_to_report_col_impl ) RETURN SELF AS RESULT IS
BEGIN
SELF.analytics_window := new matt_ratio_to_report_tab();
RETURN;
END;
STATIC FUNCTION ODCIAggregateInitialize (sctx IN OUT matt_ratio_to_report_col_impl) RETURN NUMBER IS
BEGIN
DBMS_OUTPUT.PUT_LINE('ODCIAggregateInitialize()');
sctx := matt_ratio_to_report_col_impl ();
RETURN ODCIConst.Success;
END;
MEMBER FUNCTION ODCIAggregateIterate (self IN OUT matt_ratio_to_report_col_impl, value IN matt_ratio_to_report_rec ) RETURN NUMBER IS
BEGIN
DBMS_OUTPUT.PUT_LINE('ODCIAggregateIterate(' || self.analytics_window.COUNT || ')');
-- Add record to collection
self.analytics_window.extend();
self.analytics_window(self.analytics_window.COUNT) := value;
RETURN ODCIConst.Success;
END;
MEMBER FUNCTION ODCIAggregateTerminate (self IN matt_ratio_to_report_col_impl, returnValue OUT matt_ratio_to_report_tab, flags IN NUMBER) RETURN NUMBER IS
BEGIN
DBMS_OUTPUT.PUT_LINE('ODCIAggregateTerminate(' || self.analytics_window.COUNT || ' - flags: ' || flags || ')');
IF flags = 1 THEN
returnValue := self.analytics_window;
END IF;
RETURN ODCIConst.Success;
EXCEPTION
WHEN others THEN
DBMS_OUTPUT.PUT_LINE(DBMS_UTILITY.FORMAT_ERROR_STACK || ' ' || DBMS_UTILITY.FORMAT_ERROR_BACKTRACE);
RETURN ODCIConst.Success;
END;
MEMBER FUNCTION ODCIAggregateMerge (self IN OUT matt_ratio_to_report_col_impl, ctx2 IN matt_ratio_to_report_col_impl) RETURN NUMBER IS
BEGIN
-- DBMS_OUTPUT.PUT_LINE('ODCIAggregateMerge(' || self.window_sum || ' - ' || ctx2.window_sum || ')');
-- TODO: Add all elements from ctx2 window to self window
RETURN ODCIConst.Success;
END;
-- ODCIAggregateDelete
MEMBER FUNCTION ODCIAggregateDelete(self IN OUT matt_ratio_to_report_col_impl, value matt_ratio_to_report_rec) RETURN NUMBER IS
l_ctr NUMBER;
BEGIN
DBMS_OUTPUT.PUT_LINE('ODCIAggregateDelete(' || self.analytics_window.COUNT || ' - ' || value.value || ')');
l_ctr := self.analytics_window.FIRST;
<<window_loop>>
WHILE l_ctr IS NOT NULL LOOP
IF ( self.analytics_window(l_ctr).value = value.value ) THEN
self.analytics_window.DELETE(l_ctr);
DBMS_OUTPUT.PUT_LINE('... deleted slot ' || l_ctr);
EXIT window_loop;
END IF;
l_ctr := self.analytics_window.NEXT(l_ctr);
END LOOP;
RETURN ODCIConst.Success;
END;
END;
/
-- This function is the analytic version of Oracle's COLLECT function
--DROP FUNCTION matt_ratio_to_report;
CREATE OR REPLACE FUNCTION matt_ratio_to_report_col ( input matt_ratio_to_report_rec) RETURN matt_ratio_to_report_tab
PARALLEL_ENABLE AGGREGATE USING matt_ratio_to_report_col_impl;
/
-- This the actual function we want
CREATE OR REPLACE FUNCTION matt_ratio_to_report ( p_row_value NUMBER, p_report_window matt_ratio_to_report_tab ) RETURN NUMBER IS
l_report_window_sum NUMBER := 0;
l_counter NUMBER := NULL;
BEGIN
IF p_row_value IS NULL or p_report_window IS NULL THEN
RETURN NULL;
END IF;
-- Compute window sum
l_counter := p_report_window.FIRST;
WHILE l_counter IS NOT NULL LOOP
l_report_window_sum := l_report_window_sum + NVL(p_report_window(l_counter).value,0);
l_counter := p_report_window.NEXT(l_counter);
END LOOP;
RETURN p_row_value / NULLIF(l_report_window_sum,0);
END matt_ratio_to_report;
-- Create some test data
--DROP TABLE matt_test_data;
CREATE TABLE matt_test_data ( x, group# ) PARALLEL 4
AS SELECT rownum, ceil(rownum / 10) group# FROM DUAL CONNECT BY ROWNUM <= 50000;
-- TESTER 9/30
with test as (
SELECT d.x,
CEIL (d.x / 10) group#,
ratio_to_report (d.x) OVER (PARTITION BY d.group#) oracle_rr,
matt_ratio_to_report (
d.x,
matt_ratio_to_report_col (matt_ratio_to_report_rec (d.x)) OVER (PARTITION BY d.group#)) custom_rr
FROM matt_test_data d )
SELECT /*+ PARALLEL */ test.*, case when test.oracle_rr != test.custom_rr then 'Mismatch!' Else null END test_results from test
--where oracle_rr != custom_rr
ORDER BY test_results nulls last, x;
The only way I found to create a custom aggregate with multiple parameters is to create a new TYPE
with the desired number of elements and then pass an instance of that type to the aggregate:
First define the structure to hold all the "parameters" you need:
create or replace type wrapper_type as object
(
raw_grade integer,
config_data varchar
);
/
Then create your aggregate:
CREATE OR REPLACE TYPE analytic_wrapper AS OBJECT
(
.. variables you might need
STATIC FUNCTION ODCIAggregateInitialize(actx IN OUT wrapper_type) RETURN NUMBER,
MEMBER FUNCTION ODCIAggregateIterate(self IN OUT wrapper_type, val IN wrapper_type) RETURN NUMBER,
MEMBER FUNCTION ODCIAggregateTerminate(self IN wrapper_type, returnValue OUT number, flags IN NUMBER) RETURN NUMBER,
MEMBER FUNCTION ODCIAggregateMerge(self IN OUT wrapper_type, ctx2 IN wrapper_type) RETURN NUMBER
);
/
Then you need to implement the actual aggregate logic in the type body
. Once that is done, you can use something like this:
select analytic_wrapper(wrapper_type(G.raw_grade, P.course_config_data))
from ...
The above was written more or less from memory, so I'm pretty sure it's full of syntax errors, but it should get you started.
More details and examples are in the manual: http://docs.oracle.com/cd/E11882_01/appdev.112/e10765/aggr_functions.htm#ADDCI026
The manual states that such an aggregate can be used as an analytical function:
When a user-defined aggregate is used as an analytic function, the aggregate is calculated for each row's corresponding window
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With