Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

aggregate function in recursive SQL

This question is an extended and simplified version of this question.

I have been trying to solve in SQL the following iterative equation:

U^{F,D}_{t,p} = (\sum_{D} U^{F,D}_{t-1,p} + C_{t-1,p} )*R^{F,D}_{t-1,p}

that results in:

enter image description here

The closest analogy I can think of is that U^{F,D}_{t,p} is a number of cars of brand F, of certain color (D), that a car dealer (p) has available at time t. So the above equation basically says: take the cars' units from the day before t-1 (i.e., U^{F,D}_{t-1,p}), sum over colors (\sum_{D}), then add to the sum a C value from the day before (C_{t-1,p}, whatever that is), and multiply by some other number R from the day before (R^{F,D}_{t-1,p}, whatever that is too).

Simplified problem

I have managed to solve a simplified form of the above equation, namely:

enter image description here

i.e., without the sum over cars' colors (D). The sample data, and the SQL query are in the fiddle that I link, but I paste it here for reference as well:

FULL DATA:

CREATE TABLE DYNAMICS ( T DATE, T_M1 DATE, P INTEGER, F VARCHAR(255), DELTA_F VARCHAR(255), R_T_M1 NUMBER, C_T_M1 NUMBER, U_T_M1 NUMBER, R_T NUMBER, C_T NUMBER, U_T NUMBER );  

-- DAY 1, P_1  
INSERT INTO DYNAMICS VALUES ( TO_DATE('01.01.2015','DD.MM.YYYY HH24:MI:SS'), TO_DATE('31.12.2014','DD.MM.YYYY HH24:MI:SS'), 1,'BMW','RED', 0.5, 0.6, NULL, 0.7,0.8,100.0 );  
INSERT INTO DYNAMICS VALUES ( TO_DATE('01.01.2015','DD.MM.YYYY HH24:MI:SS'), TO_DATE('31.12.2014','DD.MM.YYYY HH24:MI:SS'), 1,'MERCEDES','RED', 0.5, 0.6, NULL, 0.7,0.8,50.0 );  
-- DAY 1, P_2  
INSERT INTO DYNAMICS VALUES ( TO_DATE('01.01.2015','DD.MM.YYYY HH24:MI:SS'), TO_DATE('31.12.2014','DD.MM.YYYY HH24:MI:SS'), 2,'BMW','RED', 0.5, 0.6, NULL, 0.7,0.8,10.0 );  
INSERT INTO DYNAMICS VALUES ( TO_DATE('01.01.2015','DD.MM.YYYY HH24:MI:SS'), TO_DATE('31.12.2014','DD.MM.YYYY HH24:MI:SS'), 2,'MERCEDES','RED', 0.5, 0.6, NULL, 0.7,0.8,5.0 );  
-- DAY 2, P_1  
INSERT INTO DYNAMICS VALUES ( TO_DATE('02.01.2015','DD.MM.YYYY HH24:MI:SS'), TO_DATE('01.01.2015','DD.MM.YYYY HH24:MI:SS'), 1,'BMW','RED', 0.7, 0.8, 100, 0.9,0.9, NULL );  
INSERT INTO DYNAMICS VALUES ( TO_DATE('02.01.2015','DD.MM.YYYY HH24:MI:SS'), TO_DATE('01.01.2015','DD.MM.YYYY HH24:MI:SS'), 1,'MERCEDES','RED', 0.7, 0.8, 50, 0.6,0.5, NULL );  
-- DAY 2, P_2  
INSERT INTO DYNAMICS VALUES ( TO_DATE('02.01.2015','DD.MM.YYYY HH24:MI:SS'), TO_DATE('01.01.2015','DD.MM.YYYY HH24:MI:SS'), 2,'BMW','RED', 0.7, 0.8, 10, 0.7,0.8, NULL );  
INSERT INTO DYNAMICS VALUES ( TO_DATE('02.01.2015','DD.MM.YYYY HH24:MI:SS'), TO_DATE('01.01.2015','DD.MM.YYYY HH24:MI:SS'), 2,'MERCEDES','RED', 0.7, 0.8, 5, 0.3,0.3, NULL );  
-- DAY 3, P_1  
INSERT INTO DYNAMICS VALUES ( TO_DATE('03.01.2015','DD.MM.YYYY HH24:MI:SS'), TO_DATE('02.01.2015','DD.MM.YYYY HH24:MI:SS'), 1,'BMW','RED', 0.9, 0.9, NULL, 0.2,0.3, NULL );  
INSERT INTO DYNAMICS VALUES ( TO_DATE('03.01.2015','DD.MM.YYYY HH24:MI:SS'), TO_DATE('02.01.2015','DD.MM.YYYY HH24:MI:SS'), 1,'MERCEDES','RED', 0.6, 0.5, NULL, 1.7,1.8, NULL );  
-- DAY 3, P_2  
INSERT INTO DYNAMICS VALUES ( TO_DATE('03.01.2015','DD.MM.YYYY HH24:MI:SS'), TO_DATE('02.01.2015','DD.MM.YYYY HH24:MI:SS'), 2,'BMW','RED', 0.7, 0.8, NULL, 0.2,0.3, NULL );  
INSERT INTO DYNAMICS VALUES ( TO_DATE('03.01.2015','DD.MM.YYYY HH24:MI:SS'), TO_DATE('02.01.2015','DD.MM.YYYY HH24:MI:SS'), 2,'MERCEDES','RED', 0.3, 0.3, NULL, 0.8,0.9, NULL );  

SAMPLE DATA:

The following demonstrates example data for car dealer p=1, car model F=BMW of color D=RED (D from the mathematical equation is called DELTA in SQL). The initial condition (t=0) is here 2015-01-01. For all the days t, all the parameters at t (R_T, C_T), and t-1 (R_T_M1, C_T_M1) are given. Knowing them, the task is to compute cars' units for all the days t > t=0.

|                         T |                       T_M1 | P |   F | DELTA_F | R_T_M1 | C_T_M1 | U_T_M1 | R_T | C_T |    U_T |
|---------------------------|----------------------------|---|-----|---------|--------|--------|--------|-----|-----|--------|
| January, 01 2015 00:00:00 | December, 31 2014 00:00:00 | 1 | BMW |     RED |    0.5 |    0.6 | (null) | 0.7 | 0.8 |    100 |
| January, 02 2015 00:00:00 |  January, 01 2015 00:00:00 | 1 | BMW |     RED |    0.7 |    0.8 |    100 | 0.9 | 0.9 | (null) |
| January, 03 2015 00:00:00 |  January, 02 2015 00:00:00 | 1 | BMW |     RED |    0.9 |    0.9 | (null) | 0.2 | 0.3 | (null) |

QUERY:

In order to resolve the simplified problem, I have come up with the query in the linked fiddle that I paste here as well for reference:

-- 
-- SQL
-- T -> t 
-- T_M1 -> t-1 
-- 
WITH RECU(  T, T_M1, P, F, DELTA_F, 
            R_T_M1, C_T_M1, U_T_M1, 
            R_T, C_T, U_T ) AS (
    -- Anchor member.
    SELECT  T, T_M1, P, F, DELTA_F, 
            R_T_M1, C_T_M1, 
            U_T_M1, 
            R_T, C_T, 
            U_T
    FROM DYNAMICS 
        -- Initial condition: U_{t-1} does not exist, and U_{t=0} is given
        WHERE  ( U_T_M1 IS NULL AND U_T IS NOT NULL )
    UNION ALL
    -- Recursive member.
    SELECT  NEW.T, NEW.T_M1, NEW.P, NEW.F, NEW.DELTA_F,  
            NEW.R_T_M1, NEW.C_T_M1, 
            RECU.U_T AS U_T_M1,
            NEW.R_T, NEW.C_T, 
            -- Here the magic happens, i.e., (U_{t-1} + C_{t-1})*R_{t-1} = U_{t}
            (RECU.U_T+NEW.C_T_M1)*NEW.R_T_M1 AS U_T
    FROM DYNAMICS NEW 
    INNER JOIN RECU
    ON
        -- Translates: yesterday (t-1) of the new record equals today (t) of the parent record
        NEW.T_M1 = RECU.T AND 
        NEW.P = RECU.P AND 
        NEW.F = RECU.F AND 
        NEW.DELTA_F = RECU.DELTA_F 
)
SELECT * FROM  RECU ORDER BY P, F, T;

This query, for the example data pasted above, results in:

|                         T |                       T_M1 | P |   F | DELTA_F | R_T_M1 | C_T_M1 | U_T_M1 | R_T | C_T |    U_T |
|---------------------------|----------------------------|---|-----|---------|--------|--------|--------|-----|-----|--------|
| January, 01 2015 00:00:00 | December, 31 2014 00:00:00 | 1 | BMW |     RED |    0.5 |    0.6 | (null) | 0.7 | 0.8 |    100 |
| January, 02 2015 00:00:00 |  January, 01 2015 00:00:00 | 1 | BMW |     RED |    0.7 |    0.8 |    100 | 0.9 | 0.9 |  70.56 |
| January, 03 2015 00:00:00 |  January, 02 2015 00:00:00 | 1 | BMW |     RED |    0.9 |    0.9 |  70.56 | 0.2 | 0.3 | 64.314 |

Which works well, i.e., for: 2015-01-02, U_t = (100+0.8)*0.7 = 70.56, 2015-01-03, U_t = (70.56+0.9)*0.9 = 64.314.

The query is written in such a way that it works with different car dealers, and different car brands, which can be checked running the query in the linked fiddle

Coming back to the full problem

The query above cannot handle correctly the sum over cars' colors from the original equation:

enter image description here

This was irrelevant in the simplified data, since all cars (BMW and MERCEDES) occur there only in RED, and so the sum over colors effectively vanishes.

Such full logic should be probably implemented via a GROUP BY/SUM expression built in into the original query above. Unfortunately, I do not know how to do it.

So, imagine you have data in the shape like in the simplified problem section, but now every car brand exists in two colors, e.g., like in this linked fiddle:

|                         T |                       T_M1 | P |        F | DELTA_F | R_T_M1 | C_T_M1 | U_T_M1 | R_T | C_T |    U_T |
|---------------------------|----------------------------|---|----------|---------|--------|--------|--------|-----|-----|--------|
| January, 01 2015 00:00:00 | December, 31 2014 00:00:00 | 2 | MERCEDES |   BLACK |    0.2 |    0.6 | (null) | 0.5 | 0.8 |    5.5 |
| January, 01 2015 00:00:00 | December, 31 2014 00:00:00 | 2 | MERCEDES |     RED |    0.5 |    0.6 | (null) | 0.7 | 0.8 |      5 |
| January, 02 2015 00:00:00 |  January, 01 2015 00:00:00 | 2 | MERCEDES |   BLACK |    0.5 |    0.8 |    5.5 | 1.3 | 0.5 | (null) |
| January, 02 2015 00:00:00 |  January, 01 2015 00:00:00 | 2 | MERCEDES |     RED |    0.7 |    0.8 |      5 | 4.3 | 0.5 | (null) |
| January, 03 2015 00:00:00 |  January, 02 2015 00:00:00 | 2 | MERCEDES |   BLACK |    1.3 |    0.5 | (null) | 0.3 | 0.9 | (null) |
| January, 03 2015 00:00:00 |  January, 02 2015 00:00:00 | 2 | MERCEDES |     RED |    4.3 |    0.5 | (null) | 0.4 | 0.9 | (null) |

Given such data, you would expect for the dealer p=2 F=MERCEDES cars dynamics to look as follows:

U^{MERCEDES,BLACK}_{T=2015-01-02,P=2} = ( (5.5 + 5) + 0.8 )*0.5 = 11.3*0.5 = 5.65 
U^{MERCEDES,RED}_{T=2015-01-02,P=2} = ( (5.5 + 5) + 0.8 )*0.7 = 11.3*0.7 = 7.91

U^{MERCEDES,BLACK}_{T=2015-01-03,P=2} = ( (5.65 + 7.91) + 0.5 )*1.3 = 14.06*1.3 = 18.278
U^{MERCEDES,RED}_{T=2015-01-03,P=2} = ( (5.65 + 7.91) + 0.5 )*4.3 = 14.06*4.3 = 60.458

Question is how the simplified query above should be adjusted to solve this problem.

like image 845
Simon Righley Avatar asked Nov 22 '15 17:11

Simon Righley


People also ask

What are recursive functions in SQL?

A recursive query is one that is defined by a Union All with an initialization fullselect that seeds the recursion. The iterative fullselect contains a direct reference to itself in the FROM clause. There are additional restrictions as to what can be specified in the definition of a recursive query.

What is aggregate function in SQL with example?

An aggregate function in SQL performs a calculation on multiple values and returns a single value. SQL provides many aggregate functions that include avg, count, sum, min, max, etc. An aggregate function ignores NULL values when it performs the calculation, except for the count function.

What is an aggregate SQL function?

SQL aggregation is the task of collecting a set of values to return a single value. It is done with the help of aggregate functions, such as SUM, COUNT, and AVG. For example, in a database of products, you might want to calculate the average price of the whole inventory.


2 Answers

I do not think this is the best answer but I think it gives you the result that you are looking for.

WITH RECU(  T, T_M1, P, F, DELTA_F, 
            R_T_M1, C_T_M1, U_T_M1, 
            R_T, C_T, U_T ) AS (
    -- Anchor member.

    SELECT  T, T_M1, P, F, DELTA_F, 
            R_T_M1, C_T_M1, 
            U_T_M1, 
            R_T, C_T, 
-- Start SUM of u_t
              (select sum(u_t) from DYNAMICS d2
               where d2.T=d1.T and d2.T_M1=d1.T_M1 and d2.P=d1.P and d2.F=d1.F
               group by T, T_M1, P, F) as u_t
-- End SUM of u_t   
    FROM DYNAMICS d1
        -- Initial condition: U_{t-1} does not exist, and U_{t=0} is given
        WHERE  ( U_T_M1 IS NULL AND U_T IS NOT NULL )
    UNION ALL
    -- Recursive member.

    SELECT  NEW.T, NEW.T_M1, NEW.P, NEW.F, NEW.DELTA_F,  
            NEW.R_T_M1, NEW.C_T_M1, 
            RECU.U_T AS U_T_M1,
            NEW.R_T, NEW.C_T
              , 
            -- Here the magic happens, i.e., (U_{t-1} + C_{t-1})*R_{t-1} = U_{t}
            (
              RECU.U_T
              +NEW.C_T_M1)*NEW.R_T_M1 AS U_T
    FROM DYNAMICS NEW 
    INNER JOIN RECU
    ON
        -- Translates: yesterday (t-1) of the new record equals today (t) of the parent record
        NEW.T_M1 = RECU.T AND 
        NEW.P = RECU.P AND 
        NEW.F = RECU.F AND 
        NEW.DELTA_F = RECU.DELTA_F 
)
SELECT * FROM  RECU ORDER BY P, F, T;

What i've added is between Start SUM of u_t and End SUM of u_t comments and here is the fiddle.

like image 182
Cristian_I Avatar answered Oct 18 '22 02:10

Cristian_I


The solution turned out to be easier than I had thought (although I have spent one day trying all sorts of things, and now all seems to be trivial).

The query working (tested) on the original fiddle data reads:

WITH RECU(  T, T_M1, P, F, DELTA_F, 
            R_T_M1, C_T_M1, U_T_M1, 
            R_T, C_T, U_T ) AS (
    -- Anchor member.
    SELECT  T, T_M1, P, F, DELTA_F, 
            R_T_M1, C_T_M1, 
            U_T_M1, 
            R_T, C_T, 
            U_T
    FROM DYNAMICS 
        -- Initial condition: U_{t-1} does not exist, and U_{t=0} is given
        WHERE  ( U_T_M1 IS NULL AND U_T IS NOT NULL )
    UNION ALL
    -- Recursive member.
    SELECT  NEW.T, NEW.T_M1, NEW.P, NEW.F, NEW.DELTA_F,  
            NEW.R_T_M1, NEW.C_T_M1, 
            RECU.U_T AS U_T_M1,
            NEW.R_T, NEW.C_T,
            -- Here the magic happens, i.e., (U_{t-1} + C_{t-1})*R_{t-1} = U_{t}
            ( (( SUM(RECU.U_T) OVER (PARTITION BY NEW.T, NEW.T_M1, NEW.P, NEW.F) ) + NEW.C_T_M1)*NEW.R_T_M1 ) AS U_T
    FROM DYNAMICS NEW 
    INNER JOIN RECU
    ON
        -- Translates: yesterday (t-1) of the new record equals today (t) of the parent record
        NEW.T_M1 = RECU.T AND 
        NEW.P = RECU.P AND 
        NEW.F = RECU.F AND 
        NEW.DELTA_F = RECU.DELTA_F 
)
SELECT * FROM RECU 
ORDER BY P, F, T, DELTA_F;

Which is a minimal change to the original query (only one line of the original query affected), and uses ORACLE analytic function.

like image 1
Simon Righley Avatar answered Oct 18 '22 03:10

Simon Righley