Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Interesting Oracle analytic query challenge

I'm fairly experienced with Oracle analytic functions but this one has stumped me. I'll kick myself if there's an obvious solution :)

I have a table, JOURNAL, which records Inserts, Updates and Deletes on another table.

The table that it's a journal for is BOND_PAYMENTS, which represents links between PAYMENTS and BONDS; it stores the amount of money (AMOUNT) that was allocated to a particular bond (identified by BOND_NUMBER) from a particular payment (identified by PAYMENT_ID). In addition, it records what aspect of the bond it was allocated to (BOP_DOMAIN) which might be 'BON', 'PET', or some other code. The BOND_PAYMENTS table has a surrogate key (BOP_ID).

Therefore, my journal table will typically have 1 or more records for each BOP_ID - firstly, an INSert, followed perhaps by some UPDates, followed perhaps by a DELete.

Here is the JOURNAL table:

CREATE TABLE JOURNAL
( JN_DATE_TIME  DATE         NOT NULL,
  JN_OPERATION  VARCHAR2(3)  NOT NULL,
  BOP_ID        NUMBER(9)    NOT NULL,
  PAYMENT_ID    NUMBER(9)    NOT NULL,
  BOND_NUMBER   VARCHAR2(20) NOT NULL,
  BOP_DOMAIN    VARCHAR2(10) NOT NULL,
  AMOUNT        NUMBER(14,2) NOT NULL
);

Here is some sample data:

INSERT INTO JOURNAL VALUES (TO_DATE('01/01/2010','DD/MM/YYYY'),'INS',1242043,1003700,'9995/10','BON',1800);
INSERT INTO JOURNAL VALUES (TO_DATE('03/01/2010','DD/MM/YYYY'),'INS',1242046,1003700,'9998/10','BON',1700);
INSERT INTO JOURNAL VALUES (TO_DATE('04/01/2010','DD/MM/YYYY'),'INS',1242048,1003700,'9999/10','BON',1800);
INSERT INTO JOURNAL VALUES (TO_DATE('05/01/2010','DD/MM/YYYY'),'INS',1242052,1003700,'10003/10','BON',1600);
INSERT INTO JOURNAL VALUES (TO_DATE('08/01/2010','DD/MM/YYYY'),'INS',1242058,1003700,'9998/10','BON',100);
INSERT INTO JOURNAL VALUES (TO_DATE('09/01/2010','DD/MM/YYYY'),'UPD',1242058,1003700,'9998/10','PET',100);
INSERT INTO JOURNAL VALUES (TO_DATE('01/01/2010','DD/MM/YYYY'),'INS',2242043,1003701,'8995/10','BON',1800);
INSERT INTO JOURNAL VALUES (TO_DATE('02/01/2010','DD/MM/YYYY'),'INS',2242046,1003701,'8998/10','BON',1700);
INSERT INTO JOURNAL VALUES (TO_DATE('03/01/2010','DD/MM/YYYY'),'INS',2242048,1003701,'8999/10','BON',1800);
INSERT INTO JOURNAL VALUES (TO_DATE('04/01/2010','DD/MM/YYYY'),'INS',2242058,1003701,'8998/10','BON',100);
INSERT INTO JOURNAL VALUES (TO_DATE('05/01/2010','DD/MM/YYYY'),'UPD',2242046,1003701,'8998/10','BON',1500);
INSERT INTO JOURNAL VALUES (TO_DATE('06/01/2010','DD/MM/YYYY'),'INS',2242052,1003701,'9003/10','BON',1600);
INSERT INTO JOURNAL VALUES (TO_DATE('07/01/2010','DD/MM/YYYY'),'UPD',2242058,1003701,'8998/10','PET',200);

Now, I need to extract a full set of data from this journal table but in a slightly different format. The main requirement is that we don't want the journal table to record BOP_DOMAIN anymore - it's just not required.

I need to generate a history of the total amount for each BOND_PAYMENT record. I cannot use the BOND_PAYMENT table itself because it only shows the latest status of each record. I need to mine this info from the journal.

I can't just take a SUM(amount) over(partition by payment_id, bond_number) because an individual BOP_ID might be updated several times; so at any one moment in time only the latest amount recorded for that BOP_ID should be used.

Given the above sample data, here is an illustration what I'd expect to produce:

SELECT jn_date_time,
       jn_operation,
       bop_id,
       payment_id,
       bond_number,
       bop_domain,
       amount,
       ? as running_total
FROM   JOURNAL
ORDER BY jn_date_time;

sample data and expected results

Here I've reproduced on the left the sample data, for two sample payments. On the right I've got "Running Total", which is the expected output. Next to it (in red) is the logic how it calculated the running total for each row.

The "Running Total" is a snapshot, at the point in time of the journal entry, of the total amount for that combination of PAYMENT_ID and BOND_NUMBER. Remember, a particular BOP_ID might be updated several times; the total amount must consider only the most recent record for that BOP_ID.

Any solution that works will be acceptable, but I suspect an analytic function (or combination of analytic functions) will be the best way to solve this.

like image 695
Jeffrey Kemp Avatar asked Dec 02 '11 02:12

Jeffrey Kemp


1 Answers

Try this

WITH inner AS  
  (SELECT jn_date_time,
     jn_operation,
     bop_id,
     payment_id,
     bond_number,
     bop_domain,
     amount,
     amount - coalesce(lag(amount) over (partition by bop_id order by jn_date_time), 0)      
        as delta_bop_amount
  FROM   JOURNAL)
SELECT inner.*, 
sum(delta_bop_amount) 
  over (partition by payment_id, bond_number order by jn_date_time)  as running_total
FROM inner
ORDER BY bond_number, payment_id

This will return the same answer for your examples.

You need two passes - the analytic function in the inner query figures out how much each record changes the total for each BOP_ID. An INS is a straight addition, an UPD has to subtract the most recent value and add the new one.

The second pass then does a running total by bond/payment.

I'm assuming that you wanted to treat the bond/payment as a natural key for the running sum, and that there may be multiple BOP_ID's for any bond/payment combination.

like image 71
wrschneider Avatar answered Oct 10 '22 02:10

wrschneider