Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculate the percentage of the root owned by its parents

In simplified terms, I'm trying to calculate the percentage of the root of a tree owned by its parents, further up the tree. How can I do this in SQL alone?

Here's my (sample) schema. Please note that though the hierarchy itself is quite simple there is an additional holding_id, which means that a single parent can "own" different parts of their child.

create table hierarchy_test ( 
       id number -- "root" ID
     , parent_id number -- Parent of ID
     , holding_id number -- The ID can be split into multiple parts
     , percent_owned number (3, 2)
     , primary key (id, parent_id, holding_id) 
        );

And some sample data:

insert all 
 into hierarchy_test values (1, 2, 1, 1) 
 into hierarchy_test values (2, 3, 1, 0.25)
 into hierarchy_test values (2, 4, 1, 0.25)
 into hierarchy_test values (2, 5, 1, 0.1)
 into hierarchy_test values (2, 4, 2, 0.4)
 into hierarchy_test values (4, 5, 1, 1)
 into hierarchy_test values (5, 6, 1, 0.3)
 into hierarchy_test values (5, 7, 1, 0.2)
 into hierarchy_test values (5, 8, 1, 0.5)
select * from dual;

SQL Fiddle

The following query returns the calculation I would like to make. Due to the nature of SYS_CONNECT_BY_PATH it can't, to my knowledge, perform the calculation itself.

 select a.*, level as lvl
      , '1' || sys_connect_by_path(percent_owned, ' * ') as calc
   from hierarchy_test a
  start with id = 1
connect by nocycle prior parent_id = id

There are cyclical relationships in the data, just not in this example.

At the moment I'm going to use a pretty simple function to turn the string in the calc column into a number

create or replace function some_sum ( P_Sum in varchar2 ) return number is
   l_result number;
begin  
   execute immediate 'select ' || P_Sum || ' from dual'
     into l_result;
     
   return l_result;   
end;
/

This seems to be a ridiculous way of going about it and I would rather avoid the additional time that will be taken parsing the dynamic SQL1.

Theoretically, I think, I should be able to use the MODEL clause to calculate this. My problem is caused by the non-uniqueness of the tree. One of my attempts at using the MODEL clause to do this is:

select *
  from ( select a.*, level as lvl
              , '1' || sys_connect_by_path(percent_owned, ' * ') as calc
           from hierarchy_test a
          start with id = 1
        connect by nocycle prior parent_id = id
                 )
 model
 dimension by (lvl ll, id ii)
 measures (percent_owned, parent_id )
 rules upsert all ( 
   percent_owned[any, any]
   order by ll, ii  = percent_owned[cv(ll), cv(ii)] * nvl( percent_owned[cv(ll) - 1, parent_id[cv(ll), cv(ii)]], 1)
               )

This, understandably, fails with the following:

ORA-32638: Non unique addressing in MODEL dimensions

Using UNIQUE SINGLE REFERENCE fails for a similar reason, namely that the ORDER BY clause is not unique.

tl;dr

Is there a simple way to calculate the percentage of the root of a tree owned by its parents using only SQL? If I'm on the right track with MODEL where am I going wrong?

1. I'd also like to avoid the PL/SQL SQL context-switch. I realise that this is a tiny amount of time but this is going to be difficult enough to do quickly without adding an additional few minutes a day.

like image 758
Ben Avatar asked Dec 10 '12 18:12

Ben


People also ask

Do you get 50% DNA from each parent?

After all, children inherit half of their DNA from each parent: 50 percent from mom (through an egg), and 50 percent from dad (through sperm).

How do you calculate your heritage percentage?

Multiply the number of foreign-born grandparents you have from each country by 25 to find the percentage of your heritage from those countries.

How much DNA do you share with your parents?

Like siblings, parents and children share 50 percent of their DNA with one another. While the shared DNA between full siblings includes 25 percent of the mother's DNA and 25 percent of the father's DNA, the DNA shared between a parent and child is 50 percent of that parent's DNA.

What percentage of DNA is from grandparents?

The percentage of DNA that you share with each grandparent is around 25%. It's true there are some pieces of DNA that are not passed on evenly from all 4 grandparents. But they overall make up a very small percentage of your total DNA.


2 Answers

In 11g, Probably something like-

SELECT a.*, LEVEL AS lvl
      ,XMLQuery( substr( sys_connect_by_path( percent_owned, '*' ), 2 ) RETURNING CONTENT).getnumberval() AS calc
   FROM hierarchy_test a
  START WITH id = 1
CONNECT BY nocycle PRIOR parent_id = id;

SQL Fiddle.

Or, as per your '1'|| trick-

SELECT a.*, LEVEL AS lvl
      , XMLQuery( ('1'|| sys_connect_by_path( percent_owned, '*' )) RETURNING CONTENT).getnumberval() AS calc
   FROM hierarchy_test a
  START WITH id = 1
CONNECT BY nocycle PRIOR parent_id = id;

Unfortunately in 10g, XMLQuery cannot accept functions and always expects a string literal for evaluation for example-

select XMLQuery('1*0.25' RETURNING CONTENT).getnumberval() as val 
  from dual;

works and returns 0.25, but

select XMLQuery(substr('*1*0.25',2) RETURNING CONTENT).getnumberval() as val
   from dual;

gives ORA-19102: XQuery string literal expected.

The query might get slower as the number of levels on a tree increases with an additional overhead of internal tree creation by XMLQuery itself. The most optimum method to achieve the result would still be a PL/SQL Function which by the way would work both in 10g and 11g.

like image 167
AnBisw Avatar answered Oct 15 '22 21:10

AnBisw


This deserves an answer; though beware we're operating under a few special circumstances.

The first thing to mention is that the best possible way to do this is with recursive sub-query factoring/a recursive CTE as per Daniel Hilgarth and jonearles in the comments:

with temp (id, parent_id, percent_owned, calc) as (
  select a.id, a.parent_id, a.percent_owned, percent_owned as calc
    from hierarchy_test a
   where id = 1
   union all
  select a.id, a.parent_id, a.percent_owned, a.percent_owned * t.calc as calc
    from temp t
    join hierarchy_test a
      on t.parent_id = a.id
         )
select * 
  from temp

Their SQL Fiddle..

Unfortunately the complexity of the query and the size of the data we#re working with is such that this turned out to be impossible. There was no way to do it without full-scanning some overly large tables each time.

This doesn't necessarily mean that we're back to CONNECT BY. There is the opportunity to calculate hierarchies in bulk. Unfortunately, this turned out to be impossible as well; an hour in the database crashed. Thrice. We were using up almost 100GB of UNDO and the server just couldn't cope.

These are the special circumstances; we have to calculate hundreds of thousands of hierarchies in a few hours, at most. The average one is about 1.5 levels deep with maybe 5-10 leaves and 8-12 nodes in total. However, the outliers have 90k nodes, 27 levels and multiple cyclic relationships. The outliers aren't anywhere near rare enough.

So, CONNECT BY. Benchmarking Annjawn's solution against the PL/SQL EXECUTE IMMEDIATE suggested in the question indicated that for an above average tree XMLQuery() was up to 4 times slower. Excellent, had the answer; no other option; leave it at that.

Not.

Because we're calculating so many hierarchies with so many nodes we ended up getting overly long waits from library cache pin locks caused by the constant hard-parsing of hundreds of thousands of mathematical functions in EXECUTE IMMEDIATE.

No obvious response to that, so going back too Annjawn's solution it ends up 3 times quicker! The library cache pin locks completely disappear and we're back on the straight and narrow.

Not.

Unfortunately, there seems to be an Oracle bug in 11.2 that appears when you combine CONNECT BY, XMLQuery() and DBMS_SCHEDULER. In certain circumstances, normally in the larger hierarchies, it leaks huge amounts of memory. Lost the database and the server finding that one out. A report's been raised with Oracle and we're testing in 12c; though the memory leaks exhibit less, they still appear so 12c is out.

The solution? Wrap the XMLQuery() in a PL/SQL function. Memory leak solved, unfortunately this caused a large amount of contention for this function, and we started getting multi-hour Library cache: mutex x waits.. Querying x$kglob confirmed it was XMLTYPE that was getting hammered.

Andrey Nikolaev recommends either altering the system; rather not do that when everything else works fine, or using the DBMS_POOL.MARKHOT procedure to tell Oracle that you'll be accessing this object a lot. To the casual eye, this may have solved the issue, however, after about 10 minutes, and going through what seemed to be every lock that Oracle has, we ended up with 5 processes contending for CPU. Apparently there wasn't enough (54GB and 24 cores on the test box)...

We then started getting Cursor pin: s waits. Burleson recommends more hidden parameter finangling, Jonathan Lewis suggests that it's down to SGA resizing. As the DB was using automated SGA sizing we tried gradually increasing the shared pool, up to 30GB and only got back out old friend the Library cache: mutex x wait.

So, what's the solution? Who knows is the honest answer, but a Java stored procedure is working brilliantly so far, no memory leaks, no waits and significantly quicker than everything else.

I'm sure there's more out there... and I'd really like to get the MODEL clause to work if anyone has any ideas?

P.S. I can't claim credit for all of this; it's the work of about 3 people to get us to this stage...

like image 45
Ben Avatar answered Oct 15 '22 21:10

Ben