Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating a custom, multi-argument Oracle analytic function

Background

I know that in Oracle it's possible to create custom aggregate functions that process a collection of values and return a single result. Edit: I've even read the friendly manual at docs.oracle.com/cd/B28359_01/appdev.111/b28425/aggr_functions.htm !

I also know that Oracle provides built-in analytic functions like DENSE_RANK and RATIO_TO_REPORT, which provide values for each input, relative to a collection/window of values that input lies within.

Problem

What I want to know is if there's a way to create my own analytic function, presumably in a similar manner as I can create my own aggregation function, and in particular create one with additional arguments in my custom analytic function.

Subtle terminological caveat

When I refer to an "analytic function", please read it as a function which, in addition to accepting windowing parameters via the PARTITION keyword, also can return different values within a given window. (If anyone has a better term for this, please let me know! Pure analytic function? DENSE_RANK-class analytic function? Non-aggregate analytic function?)

The Oracle documentation notes that an aggregate function can be used as an analytic (windowed) function. Unfortunately, this only means that the PARTITION keyword for specifying windowing in analytic functions can be applied to aggregate functions as well. It doesn't promote the aggregate function to my coveted status of being able to return different values within a fixed window.

Aggregate used as analytic:

SELECT SUM(income) OVER (PARTITION BY first_initial) AS total FROM data;

will have as many records as data, but it will only have as many distinct totals as there are first initials.

Analytic used as analytic:

SELECT RATIO_TO_REPORT(income) OVER (PARTITION BY first_initial) AS ratio FROM data;

will have as many records as data, AND, even within a given first_initial partition, those ratios may all be distinct.

Context

I have been provided with call-only access to a PL/SQL procedure which accepts a numeric collection as an IN OUT parameter, and which has a few other IN configuration parameters. The procedure modifies the collection's values (think of it like "The University's Proprietary Sanctioned and Required Grade Curving Procedure") in a manner influenced by the configuration parameters.

Currently, the process to use the procedure is to hardcode a cursor loop that detects a change from one data partition to another, and then within each partition fetches data into a collection which is then passed to the procedure, altered, and eventually dumped back out into a separate table. I planned to improve this by making a PIPELINED PARALLEL_ENABLE table function that encapsulates some of the logic, but I'd much prefer to enable queries like the following:

SELECT G.Course_ID
     , G.Student_ID
     , G.Raw_Grade
     , analytic_wrapper(G.raw_grade, P.course_config_data)
                  OVER (PARTITION BY G.Course_ID) AS Adjusted_Grade
     , P.course_config_data
  FROM      grades   G
  LEFT JOIN policies P
  ON G.Course_ID = P.Course_ID;

That requires being able to create a custom analytic function, though, and due to the way the procedure requires different inputs on different partitions (e.g. the Course_ID-specific P.course_config_data above) it also has to accept not only the data-to-be-aggregated argument, but also additional inputs.

Is this possible, and if so where could I find documentation? My Google-fu has failed me.

An Additional Wrinkle

The PL/SQL procedure I've been provided with is (effectively) non-deterministic, and its output has statistical properties which must be preserved. For example, if A={A[0], A[1], A[3]} are the raw grades for one particular class, and B=f(A) is the result of invoking the procedure on A at 1:00 while C=f(A) is the result of invoking the procedure on A at 1:15, then B={B[0],B[1],B[2]} and C={C[0],C[1],C[2]} are both acceptable outputs to use, but a mixture of the elements like {C[0],B[1],C[2]} is not acceptable.

The upshot of this is that the procedure must be called exactly once on each partition. (Well, technically, it can be wastefully called as many times as one wants, but all results for a partition must come from the same invocation).

Supposed, for example, that the procedure I've been supplied with operates as follows: It accepts a collection of grades as an IN OUT parameter, and then sets one of those grades, chosen at random, to 100. All the other grades are set to zero. Running this at 1pm might result in Alice having the only passing grade, while running it at 1:01pm might result in Bob having the only passing grade. Regardless, it should be the case that exactly one student per class passes, no more and no less.

like image 496
Bryant Avatar asked Aug 25 '15 18:08

Bryant


2 Answers

This version does not suffer from the caveats of my previous answer, although it is going to be slower and a little harder to use. Most of the slowness from the loop in ODCIAggregateDelete -- you may be able to find an improvement there that doesn't require looping through the whole collection.

Anyway, this version makes a custom analytic function that mimics Oracle's native COLLECT function. So, instead of trying to create a custom analytic function that computes the actual value we want, it merely computes the set of rows in the window.

Then, for each row, we pass in the row data and the results of our custom "COLLECT" analytic to a regular function that computes the value we want.

Here's the code. (NOTE: your original question also asked about multiple parameters. Easy -- just put all the fields you want into matt_ratio_to_report_rec.) (Also, sorry about the object names -- I prefix my name on everything is other developers know who to ask if the object is causing them problems.)

-- This is the input data to the analytic function
--DROP TYPE matt_ratio_to_report_rec;
CREATE OR REPLACE TYPE matt_ratio_to_report_rec AS OBJECT
  ( value   NUMBER );

-- This is a collection of input data  
--DROP TYPE matt_ratio_to_report_tab;
CREATE OR REPLACE TYPE matt_ratio_to_report_tab AS TABLE OF matt_ratio_to_report_rec;


-- This object type implements a custom analytic that acts as an analytic version of Oracle's COLLECT function
--DROP TYPE matt_ratio_to_report_col_impl;
CREATE OR REPLACE TYPE matt_ratio_to_report_col_impl AS OBJECT (
  analytics_window    matt_ratio_to_report_tab,
  CONSTRUCTOR FUNCTION matt_ratio_to_report_col_impl(SELF IN OUT NOCOPY matt_ratio_to_report_col_impl ) RETURN SELF AS RESULT,  
-- Called to initialize a new aggregation context
-- For analytic functions, the aggregation context of the *previous* window is passed in, so we only need to adjust as needed instead 
-- of creating the new aggregation context from scratch
  STATIC FUNCTION ODCIAggregateInitialize (sctx IN OUT matt_ratio_to_report_col_impl) RETURN NUMBER,
-- Called when a new data point is added to an aggregation context  
  MEMBER FUNCTION ODCIAggregateIterate (self IN OUT matt_ratio_to_report_col_impl, value IN matt_ratio_to_report_rec ) RETURN NUMBER,
-- Called to return the computed aggragate from an aggregation context
  MEMBER FUNCTION ODCIAggregateTerminate (self IN matt_ratio_to_report_col_impl, returnValue OUT matt_ratio_to_report_tab, flags IN NUMBER) RETURN NUMBER,
-- Called to merge to two aggregation contexts into one (e.g., merging results of parallel slaves) 
  MEMBER FUNCTION ODCIAggregateMerge (self IN OUT matt_ratio_to_report_col_impl, ctx2 IN matt_ratio_to_report_col_impl) RETURN NUMBER,
  -- ODCIAggregateDelete
  MEMBER FUNCTION ODCIAggregateDelete(self IN OUT matt_ratio_to_report_col_impl, value matt_ratio_to_report_rec) RETURN NUMBER  
);

CREATE OR REPLACE TYPE BODY matt_ratio_to_report_col_impl IS

CONSTRUCTOR FUNCTION matt_ratio_to_report_col_impl(SELF IN OUT NOCOPY matt_ratio_to_report_col_impl ) RETURN SELF AS RESULT IS
BEGIN
  SELF.analytics_window := new matt_ratio_to_report_tab();
  RETURN;
END;


STATIC FUNCTION ODCIAggregateInitialize (sctx IN OUT matt_ratio_to_report_col_impl) RETURN NUMBER IS
BEGIN
  DBMS_OUTPUT.PUT_LINE('ODCIAggregateInitialize()');
  sctx := matt_ratio_to_report_col_impl ();
  RETURN ODCIConst.Success;
END;


MEMBER FUNCTION ODCIAggregateIterate (self IN OUT matt_ratio_to_report_col_impl, value IN matt_ratio_to_report_rec ) RETURN NUMBER IS
BEGIN
   DBMS_OUTPUT.PUT_LINE('ODCIAggregateIterate(' || self.analytics_window.COUNT || ')');

  -- Add record to collection
  self.analytics_window.extend();
  self.analytics_window(self.analytics_window.COUNT) :=  value;
  RETURN ODCIConst.Success;
END;

MEMBER FUNCTION ODCIAggregateTerminate (self IN matt_ratio_to_report_col_impl, returnValue OUT matt_ratio_to_report_tab, flags IN NUMBER) RETURN NUMBER IS
BEGIN
   DBMS_OUTPUT.PUT_LINE('ODCIAggregateTerminate(' || self.analytics_window.COUNT || ' - flags: ' || flags || ')');
  IF flags = 1 THEN
    returnValue := self.analytics_window;
  END IF;
  RETURN ODCIConst.Success;
EXCEPTION
  WHEN others THEN 
    DBMS_OUTPUT.PUT_LINE(DBMS_UTILITY.FORMAT_ERROR_STACK || ' ' || DBMS_UTILITY.FORMAT_ERROR_BACKTRACE);
    RETURN ODCIConst.Success;
END;

MEMBER FUNCTION ODCIAggregateMerge (self IN OUT matt_ratio_to_report_col_impl, ctx2 IN matt_ratio_to_report_col_impl) RETURN NUMBER IS
BEGIN
--   DBMS_OUTPUT.PUT_LINE('ODCIAggregateMerge(' || self.window_sum || ' - ' || ctx2.window_sum || ')');
  -- TODO: Add all elements from ctx2 window to self window
  RETURN ODCIConst.Success;
END;

-- ODCIAggregateDelete
MEMBER FUNCTION ODCIAggregateDelete(self IN OUT matt_ratio_to_report_col_impl, value matt_ratio_to_report_rec) RETURN NUMBER IS
  l_ctr NUMBER;
BEGIN
   DBMS_OUTPUT.PUT_LINE('ODCIAggregateDelete(' || self.analytics_window.COUNT || ' - ' || value.value || ')');
   l_ctr := self.analytics_window.FIRST;
   <<window_loop>>
   WHILE l_ctr IS NOT NULL LOOP
     IF ( self.analytics_window(l_ctr).value = value.value ) THEN
       self.analytics_window.DELETE(l_ctr);
       DBMS_OUTPUT.PUT_LINE('... deleted slot ' || l_ctr);
       EXIT window_loop;
     END IF;
     l_ctr := self.analytics_window.NEXT(l_ctr);
   END LOOP;
  RETURN ODCIConst.Success;

END;  

END;
/

-- This function is the analytic version of Oracle's COLLECT function
--DROP FUNCTION matt_ratio_to_report;
CREATE OR REPLACE FUNCTION matt_ratio_to_report_col ( input matt_ratio_to_report_rec) RETURN matt_ratio_to_report_tab
PARALLEL_ENABLE AGGREGATE USING matt_ratio_to_report_col_impl;
/


-- This the actual function we want
CREATE OR REPLACE FUNCTION matt_ratio_to_report ( p_row_value NUMBER, p_report_window matt_ratio_to_report_tab ) RETURN NUMBER IS
  l_report_window_sum NUMBER := 0;
  l_counter NUMBER := NULL;
BEGIN
  IF p_row_value IS NULL or p_report_window IS NULL THEN
    RETURN NULL;
  END IF;

  -- Compute window sum
  l_counter := p_report_window.FIRST;
  WHILE l_counter IS NOT NULL LOOP
    l_report_window_sum := l_report_window_sum + NVL(p_report_window(l_counter).value,0);
    l_counter := p_report_window.NEXT(l_counter);
  END LOOP;

  RETURN p_row_value / NULLIF(l_report_window_sum,0);
END matt_ratio_to_report;  



-- Create some test data
--DROP TABLE matt_test_data;
CREATE TABLE matt_test_data ( x, group# ) PARALLEL 4
AS SELECT rownum, ceil(rownum / 10) group# FROM DUAL CONNECT BY ROWNUM <= 50000;


-- TESTER 9/30
with test as (
SELECT d.x,
       CEIL (d.x / 10) group#,
       ratio_to_report (d.x) OVER (PARTITION BY d.group#) oracle_rr,
       matt_ratio_to_report (
         d.x,
         matt_ratio_to_report_col (matt_ratio_to_report_rec (d.x)) OVER (PARTITION BY d.group#)) custom_rr
FROM   matt_test_data d )
SELECT /*+ PARALLEL */ test.*, case when test.oracle_rr != test.custom_rr then 'Mismatch!' Else null END test_results from test 
--where oracle_rr != custom_rr
ORDER BY test_results nulls last, x; 
like image 95
Matthew McPeak Avatar answered Oct 04 '22 23:10

Matthew McPeak


The only way I found to create a custom aggregate with multiple parameters is to create a new TYPE with the desired number of elements and then pass an instance of that type to the aggregate:

First define the structure to hold all the "parameters" you need:

create or replace type wrapper_type as object
(
   raw_grade integer,
   config_data varchar
);
/

Then create your aggregate:

CREATE OR REPLACE TYPE analytic_wrapper AS OBJECT
(
  .. variables you might need

  STATIC FUNCTION ODCIAggregateInitialize(actx IN OUT wrapper_type) RETURN NUMBER,
  MEMBER FUNCTION ODCIAggregateIterate(self  IN OUT wrapper_type, val IN wrapper_type) RETURN NUMBER,
  MEMBER FUNCTION ODCIAggregateTerminate(self IN wrapper_type, returnValue OUT number, flags IN NUMBER) RETURN NUMBER,
  MEMBER FUNCTION ODCIAggregateMerge(self  IN OUT wrapper_type, ctx2 IN  wrapper_type) RETURN NUMBER
);
/

Then you need to implement the actual aggregate logic in the type body. Once that is done, you can use something like this:

select analytic_wrapper(wrapper_type(G.raw_grade, P.course_config_data))
from ... 

The above was written more or less from memory, so I'm pretty sure it's full of syntax errors, but it should get you started.

More details and examples are in the manual: http://docs.oracle.com/cd/E11882_01/appdev.112/e10765/aggr_functions.htm#ADDCI026

The manual states that such an aggregate can be used as an analytical function:

When a user-defined aggregate is used as an analytic function, the aggregate is calculated for each row's corresponding window

like image 37
a_horse_with_no_name Avatar answered Oct 05 '22 01:10

a_horse_with_no_name