Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

OLAP - Calculate run-off triangles, sample data and cube included (PostgreSQL/Mondrian)

Reality description: We do have a list of projects. In every project there is a lot of accounts. You can do a lot of actions on every account. I do have the following dimensions and fact table defined (simplified):

Dimensions and attributes:
 Project
  project_key
  project_name
  industry
  number_of_accounts
 Distance
  distance_key
  distance_in_months
  distance_in_quarters
 Account
  account_key
  project_key
  account_id
Fact Table and attributes:
 Action_Fact_Table
  project_key
  distance_key
  account_key
  action_id

Now, I would like to use run-off triangle approach to analyze data (it's maybe not the real run-off triangle, but the approach is the same). The most simple triangle would look like:

              Distance in Months
Project name|     1    2    3    4    5    6    7    8    9    10
-------------------------------------------------------------------------
 Project1   |     5   10   15   20   25   30   35   40   45    50
 Project2   |     7   14   21   28   35   42   49   56   63
 Project3   |     2    5    8   11   14   20   25   30
 Project4   |     0    2    5   10   18   23   40
 Project5   |     5   12   18   20   21   30

There is running sum of number of actions by rows. Distance in months shows the distance between date of action and project start date. You can obviously create similar triangle using distance in quarters (or any other period defined in distance dimension).

You can also create triangel for different level of hierarchy in project dimension, e.g. industry (Project1-Project3 = Industry1, Project4-Project5 = Industry2):

              Distance in Months
Project name|     1    2    3    4    5    6    7    8    9    10
-------------------------------------------------------------------------
 Industry1   |    14   29   44   59   74   92  109  126  108   50
 Industry2   |     5   14   23   30   39   53   40   

There is also more advanced run-off triangle where you divide running sum of actions by number of accounts. Assume there is the following number of accounts for our projects:

Project_name number_of_accounts  
-----------------------------
Project1     100
Project2     100
Project3     100
Project4     100
Project5     200

Then I would like to get the following triangle:

              Distance in Months
Project    |     1    2    3    4    5    6    7    8    9    10
------------------------------------------------------------------------
 Project1  |   .05  .01  .15  .20  .25  .30  .35  .40  .45   .50
 Project2  |    .7  .14  .21  .28  .35  .42  .49  .56  .63
 Project3  |    .2   .5   .8  .11  .14  .20  .25  .30
 Project4  |    .0   .2   .5  .10  .18  .23  .40
 Project5  |   .05  .06  .09  .10 .105  .15

This is especially usefull when you would like to compare projects and their actions in case where number of accounts in project is not the same for all projects.

The question is whether it is possible to create such calculation in OLAP. I was thinking I can use number_of_accounts in project table, but I can't figure it out. The other option is to aggregate data in account dimension. I was also not able to find anything using google, maybe because I'm asking wrong question.

The solution to this question is widely applicable in many industries, it is crucial especially in insurance and banking. It can be used everywhere where processes have long performance window and can be tracked by well defined, comparable batches of units.

(We are using PostgreSQL, Saiku, cubes are defined in Schema Workbench)

Test data (PostgreSQL syntax, let me know if you need something else)

--drop table if exists project cascade;
create table project (
  project_key int primary key,
  project_name character varying,
  industry character varying,
  number_of_accounts int
);

--drop table if exists distance cascade;
create table distance (
  distance_key int primary key,
  distance_in_months int,
  distance_in_quarters int);

--drop table if exists account cascade;
create table account (
  account_key int primary key,
  project_key int references project (project_key)
);

--drop table if exists action_fact_table cascade;
create table action_fact_table (
  project_key int references project (project_key),
  distance_key int references distance (distance_key),
  account_key int references account (account_key),
  action_id int
);

-- project data
insert into project values (1,'Project1','Industry1',100);
insert into project values (2,'Project2','Industry1',100);
insert into project values (3,'Project3','Industry1',100);
insert into project values (4,'Project4','Industry2',100);
insert into project values (5,'Project5','Industry2',200);

-- distance data
insert into distance values(1,1,1);
insert into distance values(2,2,1);
insert into distance values(3,3,1);
insert into distance values(4,4,2);
insert into distance values(5,5,2);
insert into distance values(6,6,2);
insert into distance values(7,7,3);
insert into distance values(8,8,3);
insert into distance values(9,9,3);
insert into distance values(10,10,4);
insert into distance values(11,11,4);
insert into distance values(12,12,4);

-- account data
/* let me know if you need insert statement for every row */
insert into account (
select generate_series (1,100), 1 union all
select generate_series (101,200), 2 union all
select generate_series (201,300), 3 union all
select generate_series (301,400), 4 union all
select generate_series (401,600), 5
);

insert into action_fact_table values(1,1,90,10001);
insert into action_fact_table values(1,1,32,10002);
insert into action_fact_table values(1,1,41,10003);
insert into action_fact_table values(1,1,54,10004);
insert into action_fact_table values(1,1,45,10005);
insert into action_fact_table values(1,2,22,10006);
insert into action_fact_table values(1,2,29,10007);
insert into action_fact_table values(1,2,41,10008);
insert into action_fact_table values(1,2,89,10009);
insert into action_fact_table values(1,2,15,10010);
insert into action_fact_table values(1,3,32,10011);
insert into action_fact_table values(1,3,100,10012);
insert into action_fact_table values(1,3,72,10013);
insert into action_fact_table values(1,3,80,10014);
insert into action_fact_table values(1,3,10,10015);
insert into action_fact_table values(1,4,12,10016);
insert into action_fact_table values(1,4,45,10017);
insert into action_fact_table values(1,4,83,10018);
insert into action_fact_table values(1,4,42,10019);
insert into action_fact_table values(1,4,33,10020);
insert into action_fact_table values(1,5,22,10021);
insert into action_fact_table values(1,5,27,10022);
insert into action_fact_table values(1,5,59,10023);
insert into action_fact_table values(1,5,32,10024);
insert into action_fact_table values(1,5,70,10025);
insert into action_fact_table values(1,6,32,10026);
insert into action_fact_table values(1,6,5,10027);
insert into action_fact_table values(1,6,15,10028);
insert into action_fact_table values(1,6,70,10029);
insert into action_fact_table values(1,6,43,10030);
insert into action_fact_table values(1,7,59,10031);
insert into action_fact_table values(1,7,9,10032);
insert into action_fact_table values(1,7,99,10033);
insert into action_fact_table values(1,7,79,10034);
insert into action_fact_table values(1,7,31,10035);
insert into action_fact_table values(1,8,56,10036);
insert into action_fact_table values(1,8,34,10037);
insert into action_fact_table values(1,8,48,10038);
insert into action_fact_table values(1,8,79,10039);
insert into action_fact_table values(1,8,42,10040);
insert into action_fact_table values(1,9,10,10041);
insert into action_fact_table values(1,9,10,10042);
insert into action_fact_table values(1,9,49,10043);
insert into action_fact_table values(1,9,61,10044);
insert into action_fact_table values(1,9,49,10045);
insert into action_fact_table values(1,10,99,10046);
insert into action_fact_table values(1,10,69,10047);
insert into action_fact_table values(1,10,84,10048);
insert into action_fact_table values(1,10,99,10049);
insert into action_fact_table values(1,10,3,10050);
insert into action_fact_table values(2,1,182,10051);
insert into action_fact_table values(2,1,127,10052);
insert into action_fact_table values(2,1,197,10053);
insert into action_fact_table values(2,1,174,10054);
insert into action_fact_table values(2,1,187,10055);
insert into action_fact_table values(2,1,144,10056);
insert into action_fact_table values(2,1,160,10057);
insert into action_fact_table values(2,2,155,10058);
insert into action_fact_table values(2,2,153,10059);
insert into action_fact_table values(2,2,119,10060);
insert into action_fact_table values(2,2,188,10061);
insert into action_fact_table values(2,2,125,10062);
insert into action_fact_table values(2,2,147,10063);
insert into action_fact_table values(2,2,123,10064);
insert into action_fact_table values(2,3,136,10065);
insert into action_fact_table values(2,3,163,10066);
insert into action_fact_table values(2,3,187,10067);
insert into action_fact_table values(2,3,138,10068);
insert into action_fact_table values(2,3,168,10069);
insert into action_fact_table values(2,3,132,10070);
insert into action_fact_table values(2,3,138,10071);
insert into action_fact_table values(2,4,158,10072);
insert into action_fact_table values(2,4,171,10073);
insert into action_fact_table values(2,4,153,10074);
insert into action_fact_table values(2,4,141,10075);
insert into action_fact_table values(2,4,182,10076);
insert into action_fact_table values(2,4,165,10077);
insert into action_fact_table values(2,4,143,10078);
insert into action_fact_table values(2,5,190,10079);
insert into action_fact_table values(2,5,181,10080);
insert into action_fact_table values(2,5,163,10081);
insert into action_fact_table values(2,5,134,10082);
insert into action_fact_table values(2,5,145,10083);
insert into action_fact_table values(2,5,190,10084);
insert into action_fact_table values(2,5,198,10085);
insert into action_fact_table values(2,6,137,10086);
insert into action_fact_table values(2,6,133,10087);
insert into action_fact_table values(2,6,135,10088);
insert into action_fact_table values(2,6,103,10089);
insert into action_fact_table values(2,6,187,10090);
insert into action_fact_table values(2,6,127,10091);
insert into action_fact_table values(2,6,117,10092);
insert into action_fact_table values(2,7,116,10093);
insert into action_fact_table values(2,7,139,10094);
insert into action_fact_table values(2,7,111,10095);
insert into action_fact_table values(2,7,150,10096);
insert into action_fact_table values(2,7,151,10097);
insert into action_fact_table values(2,7,181,10098);
insert into action_fact_table values(2,7,109,10099);
insert into action_fact_table values(2,8,102,10100);
insert into action_fact_table values(2,8,101,10101);
insert into action_fact_table values(2,8,118,10102);
insert into action_fact_table values(2,8,147,10103);
insert into action_fact_table values(2,8,186,10104);
insert into action_fact_table values(2,8,136,10105);
insert into action_fact_table values(2,8,160,10106);
insert into action_fact_table values(2,9,149,10107);
insert into action_fact_table values(2,9,119,10108);
insert into action_fact_table values(2,9,169,10109);
insert into action_fact_table values(2,9,176,10110);
insert into action_fact_table values(2,9,195,10111);
insert into action_fact_table values(2,9,183,10112);
insert into action_fact_table values(2,9,140,10113);
insert into action_fact_table values(3,1,224,10114);
insert into action_fact_table values(3,1,241,10115);
insert into action_fact_table values(3,2,295,10116);
insert into action_fact_table values(3,2,249,10117);
insert into action_fact_table values(3,2,260,10118);
insert into action_fact_table values(3,3,298,10119);
insert into action_fact_table values(3,3,267,10120);
insert into action_fact_table values(3,3,297,10121);
insert into action_fact_table values(3,4,211,10122);
insert into action_fact_table values(3,4,253,10123);
insert into action_fact_table values(3,4,214,10124);
insert into action_fact_table values(3,5,248,10125);
insert into action_fact_table values(3,5,223,10126);
insert into action_fact_table values(3,5,288,10127);
insert into action_fact_table values(3,6,207,10128);
insert into action_fact_table values(3,6,296,10129);
insert into action_fact_table values(3,6,221,10130);
insert into action_fact_table values(3,6,201,10131);
insert into action_fact_table values(3,6,227,10132);
insert into action_fact_table values(3,6,209,10133);
insert into action_fact_table values(3,7,267,10134);
insert into action_fact_table values(3,7,282,10135);
insert into action_fact_table values(3,7,215,10136);
insert into action_fact_table values(3,7,285,10137);
insert into action_fact_table values(3,7,212,10138);
insert into action_fact_table values(3,8,239,10139);
insert into action_fact_table values(3,8,294,10140);
insert into action_fact_table values(3,8,296,10141);
insert into action_fact_table values(3,8,251,10142);
insert into action_fact_table values(3,8,281,10143);
insert into action_fact_table values(4,2,392,10144);
insert into action_fact_table values(4,2,347,10145);
insert into action_fact_table values(4,3,318,10146);
insert into action_fact_table values(4,3,400,10147);
insert into action_fact_table values(4,3,378,10148);
insert into action_fact_table values(4,4,315,10149);
insert into action_fact_table values(4,4,318,10150);
insert into action_fact_table values(4,4,394,10151);
insert into action_fact_table values(4,4,382,10152);
insert into action_fact_table values(4,4,317,10153);
insert into action_fact_table values(4,5,314,10154);
insert into action_fact_table values(4,5,354,10155);
insert into action_fact_table values(4,5,338,10156);
insert into action_fact_table values(4,5,375,10157);
insert into action_fact_table values(4,5,317,10158);
insert into action_fact_table values(4,5,329,10159);
insert into action_fact_table values(4,5,342,10160);
insert into action_fact_table values(4,5,380,10161);
insert into action_fact_table values(4,6,313,10162);
insert into action_fact_table values(4,6,311,10163);
insert into action_fact_table values(4,6,336,10164);
insert into action_fact_table values(4,6,380,10165);
insert into action_fact_table values(4,6,355,10166);
insert into action_fact_table values(4,7,386,10167);
insert into action_fact_table values(4,7,322,10168);
insert into action_fact_table values(4,7,311,10169);
insert into action_fact_table values(4,7,367,10170);
insert into action_fact_table values(4,7,350,10171);
insert into action_fact_table values(4,7,384,10172);
insert into action_fact_table values(4,7,391,10173);
insert into action_fact_table values(4,7,331,10174);
insert into action_fact_table values(4,7,373,10175);
insert into action_fact_table values(4,7,314,10176);
insert into action_fact_table values(4,7,305,10177);
insert into action_fact_table values(4,7,331,10178);
insert into action_fact_table values(4,7,350,10179);
insert into action_fact_table values(4,7,376,10180);
insert into action_fact_table values(4,7,387,10181);
insert into action_fact_table values(4,7,312,10182);
insert into action_fact_table values(4,7,397,10183);
insert into action_fact_table values(5,1,404,10184);
insert into action_fact_table values(5,1,562,10185);
insert into action_fact_table values(5,1,511,10186);
insert into action_fact_table values(5,1,594,10187);
insert into action_fact_table values(5,1,541,10188);
insert into action_fact_table values(5,2,506,10189);
insert into action_fact_table values(5,2,427,10190);
insert into action_fact_table values(5,2,481,10191);
insert into action_fact_table values(5,2,463,10192);
insert into action_fact_table values(5,2,579,10193);
insert into action_fact_table values(5,2,455,10194);
insert into action_fact_table values(5,2,527,10195);
insert into action_fact_table values(5,3,465,10196);
insert into action_fact_table values(5,3,562,10197);
insert into action_fact_table values(5,3,434,10198);
insert into action_fact_table values(5,3,401,10199);
insert into action_fact_table values(5,3,464,10200);
insert into action_fact_table values(5,3,500,10201);
insert into action_fact_table values(5,4,554,10202);
insert into action_fact_table values(5,4,600,10203);
insert into action_fact_table values(5,5,483,10204);
insert into action_fact_table values(5,6,552,10205);
insert into action_fact_table values(5,6,565,10206);
insert into action_fact_table values(5,6,586,10207);
insert into action_fact_table values(5,6,544,10208);
insert into action_fact_table values(5,6,436,10209);
insert into action_fact_table values(5,6,531,10210);
insert into action_fact_table values(5,6,409,10211);
insert into action_fact_table values(5,6,524,10212);
insert into action_fact_table values(5,6,564,10213);

Sample cube (Mondrian):

<Schema name="RunoffTriangleSchema">
  <Cube name="RunoffTriangleCube" visible="true" cache="true" enabled="true">
    <Table name="action_fact_table" schema="public">
    </Table>
    <Dimension type="StandardDimension" visible="true" foreignKey="project_key" name="Project">
      <Hierarchy name="Project" visible="true" hasAll="true">
        <Table name="project" schema="public" alias="">
        </Table>
        <Level name="Industry" visible="true" column="industry" uniqueMembers="false">
        </Level>
        <Level name="Project Name" visible="true" column="project_name" uniqueMembers="false">
        </Level>
      </Hierarchy>
    </Dimension>
    <Dimension type="StandardDimension" visible="true" foreignKey="distance_key" name="Distance">
      <Hierarchy name="Distance" visible="true" hasAll="true">
        <Table name="distance" schema="public" alias="">
        </Table>
        <Level name="Distance In Quarters" visible="true" column="distance_in_quarters" uniqueMembers="false">
        </Level>
        <Level name="Distance In Months" visible="true" column="distance_in_months" uniqueMembers="false">
        </Level>
      </Hierarchy>
    </Dimension>
    <Dimension type="StandardDimension" visible="true" foreignKey="account_key" name="Account">
      <Hierarchy name="Account" visible="true" hasAll="true">
        <Table name="account" schema="public">
        </Table>
        <Level name="Account Key" visible="true" column="account_key" uniqueMembers="false">
        </Level>
      </Hierarchy>
    </Dimension>
    <Measure name="CountActions" column="action_id" aggregator="count" visible="true">
    </Measure>
  </Cube>
</Schema>
like image 363
Tomas Greif Avatar asked Nov 13 '12 10:11

Tomas Greif


1 Answers

Two bounties and no answer, I am suprised. I've found a workaround solution - using SQL and BIRT engine I am now close to what I was looking for. I still hope that someone can solve this for OLAP.


To make this work, I have:

  • Custom function to return dynamically selected columns
  • SQL to calculate run-off triangle data based on selected columns
  • Report in BIRT 2.6.1 to display results and provide interface for parameters selection

Dynamically return columns

    CREATE or replace FUNCTION bizdata.getColumns(_column1 text, _column2 text, _column3 text, _column4 text, _table text, _rqdl text)
      RETURNS TABLE(cmf1 text, cmf2 text, cmf3 text, outval numeric, rqdl text) AS $$
    BEGIN
        RETURN QUERY EXECUTE 
            'SELECT ' 
                || case when _column1 = 'None' then quote_literal('None') else quote_ident(_column1) end || '::text as cmf1,' 
                || case when _column2 = 'None' then quote_literal('None') else quote_ident(_column2) end || '::text as cmf2,' 
                || case when _column3 = 'None' then quote_literal('None') else quote_ident(_column3) end || '::text as cmf3,'   
                || quote_ident(_column4) || '::numeric as baseline,'
                || case when _rqdl = 'None' then 0::text else quote_ident(_rqdl)::text end || '::text as rqdl'  
            ' FROM '
                || 'bizdata.' || _table; 
    END;
     $$ LANGUAGE plpgsql;

Thi function takes the following as input variables:

 - _column1 - common mapping field number 1
 - _column2 - common mapping field number 2
 - _column3 - common mapping field number 3
 - _column4 - column used for aggregation (sum)
 - _table - table used for getting data
 - _rqdl - requested distance level

Calculate data

Using bizdata.getColumns() function I can calculate triangle data using the following statement:


with 

params as (
    select 'cmf1'::varchar as prm_name, 'project_owner_name_short'::varchar as prm_value union all 
    select 'cmf2'::varchar as prm_name, 'project_source_name_short'::varchar as prm_value union all
    select 'cmf3'::varchar as prm_name, 'None'::varchar as prm_value union all
    select 'fact'::varchar as prm_name, 'amount'::varchar as prm_value union all    
    select 'fact_table'::varchar as prm_name, 'dwv_daily_allocation_fact'::varchar as prm_value union all       
    select 'baseline'::varchar as prm_name, 'tmp_nominal_value'::varchar as prm_value union all 
    select 'baseline_table'::varchar as prm_name, 'dw_project'::varchar as prm_value union all
    select 'rqdl'::varchar as prm_name, 'year_distance'::varchar as prm_value 
)

,baseline_data as (
    select 
        cmf1,
        cmf2,
        cmf3,
        sum(coalesce(outval,0)) as baseline
    from 
        bizdata.getColumns(
            (select prm_value from params where prm_name = 'cmf1'::text),
            (select prm_value from params where prm_name = 'cmf2'::text),
            (select prm_value from params where prm_name = 'cmf3'::text),
            (select prm_value from params where prm_name = 'baseline'::text),
            (select prm_value from params where prm_name = 'baseline_table'::text), 
            'None'
            )
    group by
        cmf1,
        cmf2,
        cmf3

)




,fact_data as (
    select 
        cmf1,
        cmf2,
        cmf3,
        rqdl::int as rqdl,
        sum(coalesce(outval,0)) as fact
    from 
        bizdata.getColumns(
            (select prm_value from params where prm_name = 'cmf1'::text),
            (select prm_value from params where prm_name = 'cmf2'::text),
            (select prm_value from params where prm_name = 'cmf3'::text),
            (select prm_value from params where prm_name = 'fact'::text),
            (select prm_value from params where prm_name = 'fact_table'::text),
            (select prm_value from params where prm_name = 'rqdl'::text)
            )
    group by
        cmf1,
        cmf2,
        cmf3,
        rqdl

)

select 
    case when cmf1 = 'None' then null else cmf1 end as cmf1,
    case when cmf2 = 'None' then null else cmf1 end as cmf,
    case when cmf3 = 'None' then null else cmf1 end as cmf1,
    rqdl,
    fact,
    baseline,
    sum(fact) over (partition by cmf1, cmf2, cmf3 order by rqdl) as cfact,
    sum(fact) over (partition by cmf1, cmf2, cmf3 order by rqdl) / baseline as cfactpct
from 
    fact_data 
    join baseline_data using (cmf1, cmf2, cmf3)

You can see that I can use up to 3 grouping variables (cmf1, cmf2, cmf3) and choose any distance attribute (as long as the attribute is available in dwv_daily_allocation_fact. Grouping variables should be available both in baseline table and fact table (to get common group level)

Report

Last step is to create report in BIRT (2.6.1) where parameters in params part of SQL are replaced by dataset parameters and linked to report parameters. Those who are using BIRT probably understand, other have to find other way.

Parameters selection GUI Parameters selection GUI

Output report enter image description here

I still have to figure out correct sorting of table (so groups with longest history are first.

Edit: I've figured out sorting in BIRT crosstab, now it looks like real triangle:

enter image description here

Let me know if you need more detailed description on how I did this.

like image 59
Tomas Greif Avatar answered Nov 15 '22 05:11

Tomas Greif