In simplified terms, I'm trying to calculate the percentage of the root of a tree owned by its parents, further up the tree. How can I do this in SQL alone?
Here's my (sample) schema. Please note that though the hierarchy itself is quite simple there is an additional holding_id
, which means that a single parent can "own" different parts of their child.
create table hierarchy_test (
id number -- "root" ID
, parent_id number -- Parent of ID
, holding_id number -- The ID can be split into multiple parts
, percent_owned number (3, 2)
, primary key (id, parent_id, holding_id)
);
And some sample data:
insert all
into hierarchy_test values (1, 2, 1, 1)
into hierarchy_test values (2, 3, 1, 0.25)
into hierarchy_test values (2, 4, 1, 0.25)
into hierarchy_test values (2, 5, 1, 0.1)
into hierarchy_test values (2, 4, 2, 0.4)
into hierarchy_test values (4, 5, 1, 1)
into hierarchy_test values (5, 6, 1, 0.3)
into hierarchy_test values (5, 7, 1, 0.2)
into hierarchy_test values (5, 8, 1, 0.5)
select * from dual;
SQL Fiddle
The following query returns the calculation I would like to make. Due to the nature of SYS_CONNECT_BY_PATH it can't, to my knowledge, perform the calculation itself.
select a.*, level as lvl
, '1' || sys_connect_by_path(percent_owned, ' * ') as calc
from hierarchy_test a
start with id = 1
connect by nocycle prior parent_id = id
There are cyclical relationships in the data, just not in this example.
At the moment I'm going to use a pretty simple function to turn the string in the calc
column into a number
create or replace function some_sum ( P_Sum in varchar2 ) return number is
l_result number;
begin
execute immediate 'select ' || P_Sum || ' from dual'
into l_result;
return l_result;
end;
/
This seems to be a ridiculous way of going about it and I would rather avoid the additional time that will be taken parsing the dynamic SQL1.
Theoretically, I think, I should be able to use the MODEL clause to calculate this. My problem is caused by the non-uniqueness of the tree. One of my attempts at using the MODEL clause to do this is:
select *
from ( select a.*, level as lvl
, '1' || sys_connect_by_path(percent_owned, ' * ') as calc
from hierarchy_test a
start with id = 1
connect by nocycle prior parent_id = id
)
model
dimension by (lvl ll, id ii)
measures (percent_owned, parent_id )
rules upsert all (
percent_owned[any, any]
order by ll, ii = percent_owned[cv(ll), cv(ii)] * nvl( percent_owned[cv(ll) - 1, parent_id[cv(ll), cv(ii)]], 1)
)
This, understandably, fails with the following:
ORA-32638: Non unique addressing in MODEL dimensions
Using UNIQUE SINGLE REFERENCE fails for a similar reason, namely that the ORDER BY clause is not unique.
Is there a simple way to calculate the percentage of the root of a tree owned by its parents using only SQL? If I'm on the right track with MODEL where am I going wrong?
1. I'd also like to avoid the PL/SQL SQL context-switch. I realise that this is a tiny amount of time but this is going to be difficult enough to do quickly without adding an additional few minutes a day.
After all, children inherit half of their DNA from each parent: 50 percent from mom (through an egg), and 50 percent from dad (through sperm).
Multiply the number of foreign-born grandparents you have from each country by 25 to find the percentage of your heritage from those countries.
Like siblings, parents and children share 50 percent of their DNA with one another. While the shared DNA between full siblings includes 25 percent of the mother's DNA and 25 percent of the father's DNA, the DNA shared between a parent and child is 50 percent of that parent's DNA.
The percentage of DNA that you share with each grandparent is around 25%. It's true there are some pieces of DNA that are not passed on evenly from all 4 grandparents. But they overall make up a very small percentage of your total DNA.
In 11g, Probably something like-
SELECT a.*, LEVEL AS lvl
,XMLQuery( substr( sys_connect_by_path( percent_owned, '*' ), 2 ) RETURNING CONTENT).getnumberval() AS calc
FROM hierarchy_test a
START WITH id = 1
CONNECT BY nocycle PRIOR parent_id = id;
SQL Fiddle.
Or, as per your '1'||
trick-
SELECT a.*, LEVEL AS lvl
, XMLQuery( ('1'|| sys_connect_by_path( percent_owned, '*' )) RETURNING CONTENT).getnumberval() AS calc
FROM hierarchy_test a
START WITH id = 1
CONNECT BY nocycle PRIOR parent_id = id;
Unfortunately in 10g, XMLQuery
cannot accept functions and always expects a string literal for evaluation for example-
select XMLQuery('1*0.25' RETURNING CONTENT).getnumberval() as val
from dual;
works and returns 0.25
, but
select XMLQuery(substr('*1*0.25',2) RETURNING CONTENT).getnumberval() as val
from dual;
gives ORA-19102: XQuery string literal expected
.
The query might get slower as the number of levels on a tree increases with an additional overhead of internal tree creation by XMLQuery
itself. The most optimum method to achieve the result would still be a PL/SQL Function which by the way would work both in 10g and 11g.
This deserves an answer; though beware we're operating under a few special circumstances.
The first thing to mention is that the best possible way to do this is with recursive sub-query factoring/a recursive CTE as per Daniel Hilgarth and jonearles in the comments:
with temp (id, parent_id, percent_owned, calc) as (
select a.id, a.parent_id, a.percent_owned, percent_owned as calc
from hierarchy_test a
where id = 1
union all
select a.id, a.parent_id, a.percent_owned, a.percent_owned * t.calc as calc
from temp t
join hierarchy_test a
on t.parent_id = a.id
)
select *
from temp
Their SQL Fiddle..
Unfortunately the complexity of the query and the size of the data we#re working with is such that this turned out to be impossible. There was no way to do it without full-scanning some overly large tables each time.
This doesn't necessarily mean that we're back to CONNECT BY
. There is the opportunity to calculate hierarchies in bulk. Unfortunately, this turned out to be impossible as well; an hour in the database crashed. Thrice. We were using up almost 100GB of UNDO and the server just couldn't cope.
These are the special circumstances; we have to calculate hundreds of thousands of hierarchies in a few hours, at most. The average one is about 1.5 levels deep with maybe 5-10 leaves and 8-12 nodes in total. However, the outliers have 90k nodes, 27 levels and multiple cyclic relationships. The outliers aren't anywhere near rare enough.
So, CONNECT BY
. Benchmarking Annjawn's solution against the PL/SQL EXECUTE IMMEDIATE
suggested in the question indicated that for an above average tree XMLQuery()
was up to 4 times slower. Excellent, had the answer; no other option; leave it at that.
Not.
Because we're calculating so many hierarchies with so many nodes we ended up getting overly long waits from library cache pin locks caused by the constant hard-parsing of hundreds of thousands of mathematical functions in EXECUTE IMMEDIATE
.
No obvious response to that, so going back too Annjawn's solution it ends up 3 times quicker! The library cache pin locks completely disappear and we're back on the straight and narrow.
Not.
Unfortunately, there seems to be an Oracle bug in 11.2 that appears when you combine CONNECT BY
, XMLQuery()
and DBMS_SCHEDULER. In certain circumstances, normally in the larger hierarchies, it leaks huge amounts of memory. Lost the database and the server finding that one out. A report's been raised with Oracle and we're testing in 12c; though the memory leaks exhibit less, they still appear so 12c is out.
The solution? Wrap the XMLQuery()
in a PL/SQL function. Memory leak solved, unfortunately this caused a large amount of contention for this function, and we started getting multi-hour Library cache: mutex x waits.. Querying x$kglob
confirmed it was XMLTYPE
that was getting hammered.
Andrey Nikolaev recommends either altering the system; rather not do that when everything else works fine, or using the DBMS_POOL.MARKHOT
procedure to tell Oracle that you'll be accessing this object a lot. To the casual eye, this may have solved the issue, however, after about 10 minutes, and going through what seemed to be every lock that Oracle has, we ended up with 5 processes contending for CPU. Apparently there wasn't enough (54GB and 24 cores on the test box)...
We then started getting Cursor pin: s waits. Burleson recommends more hidden parameter finangling, Jonathan Lewis suggests that it's down to SGA resizing. As the DB was using automated SGA sizing we tried gradually increasing the shared pool, up to 30GB and only got back out old friend the Library cache: mutex x wait.
So, what's the solution? Who knows is the honest answer, but a Java stored procedure is working brilliantly so far, no memory leaks, no waits and significantly quicker than everything else.
I'm sure there's more out there... and I'd really like to get the MODEL
clause to work if anyone has any ideas?
P.S. I can't claim credit for all of this; it's the work of about 3 people to get us to this stage...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With