We have a requirement to do some computational heavy lifting to connect with an Oracle database. Up until now, we have performed our numerical calculations in PL/SQL and largely put up with the lack of performance.
Now we are implementing a multi-echelon inventory model (https://www.researchgate.net/publication/222409130_Evaluation_of_time-varying_availability_in_multi-echelon_spare_parts_systems_with_passivation) I am much more concerned about the performance due to the size of the problem.
I have implemented part of the algorithm in three languages: Fortran (90-2008 complied with gfortran), VBA in Excel and PL/SQL, and wrapped a one-million-calls test loop around it. Even using binary_double
data type and native compilation using PLSQL_CODE_TYPE=NATIVE
(both of which result in an improvement) the test code below still takes 37s to run (Oracle XE 11.2). In comparison, VBA takes 16s and Fortran 1.6s on the same hardware.
While it would probably be too much to ask for performance approaching the Fortran figure (though this would obviously be highly desirable) I am surprised that even humble VBA out performs the PL/SQL.
So my question has two parts:
Also, though I am not directly after a critique of my source code per se, I would be grateful if anyone can spot any obvious optimizations which I can incorporate.
The function timeebo
is the test function, which I am invoking with a simple select timeebo from dual;
in SQL Developer.
create or replace FUNCTION gammln(
x IN binary_double)
RETURN binary_double
IS
--Lanczos' approximation to the Log Gamma function
gammln binary_double;
ser binary_double;
tmp binary_double;
BEGIN
tmp := x + 5.5;
tmp :=(x + 0.5) * Ln(tmp) - tmp;
ser := 1.000000000190015;
ser := ser + 76.18009172947146 /(x + 1.0) ;
ser := ser - 86.50532032941677 /(x + 2.0) ;
ser := ser + 24.01409824083091 /(x + 3.0) ;
ser := ser - 1.231739572450155 /(x + 4.0) ;
ser := ser + 1.208650973866179E-03 /(x + 5.0) ;
ser := ser - 5.395239384953E-06 /(x + 6.0) ;
RETURN tmp + Ln(2.5066282746310005 * ser / x) ;
END;
/
CREATE OR REPLACE FUNCTION PoissonDist(
k IN INTEGER,
lambda IN binary_double)
RETURN binary_double
IS
BEGIN
RETURN Exp((k * Ln(lambda)) - lambda - gammln(k + 1)) ;
END;
/
CREATE OR REPLACE FUNCTION EBO(
stock IN pls_integer,
pipeline IN binary_double,
accuracy IN binary_double DEFAULT 0.0000000001)
RETURN binary_double
IS
i pls_integer;
EBO binary_double;
term binary_double;
temp binary_double;
PoissonVal binary_double;
peaked BOOLEAN; --Flag the Poisson curve as having peaked
BEGIN
EBO := 0.0;
IF(pipeline = 0.0) THEN
RETURN EBO;
END IF;
--Initialise
i := 1;
peaked := false;
PoissonVal := PoissonDist(stock + 1, pipeline) ; --Get p() value
IF(PoissonVal < accuracy AND floor(pipeline) > stock) THEN --If p() is very
-- small...
i := floor(pipeline) - stock; --Revise i to just below peak of Poisson curve
PoissonVal := PoissonDist(stock + i, pipeline) ; --Get p() value close to
-- peak
temp := PoissonVal *(pipeline / CAST(stock + i + 1 AS binary_double)) ; --
-- Store poisson value just above peak
LOOP
term := CAST(i AS binary_double) * PoissonVal;
EBO := EBO + term;
i := i - 1; --Work backwards
PoissonVal := PoissonVal *(CAST(stock + i + 1 AS DOUBLE
PRECISION) / pipeline) ; --Revise Poisson
-- value for next time
EXIT
WHEN(term < accuracy OR i = 0) ;
END LOOP;
i := 1 + floor(pipeline) - stock;
PoissonVal := temp;
peaked := true;
END IF;
LOOP
term := CAST(i AS binary_double) * PoissonVal;
EBO := EBO + term;
i := i + 1;
PoissonVal := PoissonVal *(pipeline / CAST(stock + i AS
binary_double)) ; --Revise Poisson value for next time
IF(CAST(stock + i AS binary_double) > pipeline) THEN
peaked := true;
END IF;
EXIT
WHEN(term < accuracy AND peaked) ;
END LOOP;
IF(EBO < accuracy) THEN
EBO := 0.0;
END IF;
RETURN EBO;
END;
/
CREATE OR REPLACE FUNCTION timeebo
RETURN binary_double
IS
i pls_integer;
EBOVal binary_double;
acc binary_double;
BEGIN
acc := 0.0;
FOR i IN 1..1000000
LOOP
EBOVal := EBO(500, CAST(i AS binary_double) / 1000.0) ;
acc := acc + EBOVal;
END LOOP;
RETURN acc;
END;
The Optimizer uses costing methods, cost-based optimizer (CBO), or internal rules, rule-based optimizer (RBO), to determine the most efficient way of producing the result of the query. The Row Source Generator receives the optimal plan from the optimizer and outputs the execution plan for the SQL statement.
The Oracle server provides two methods of optimization: rule-based optimizer (RBO) and cost-based optimizer (CBO).
This is not an answer to the OP's question (but it is a "proof of concept" to show how something like what he is doing in PL/SQL can be done in plain SQL). I only use the Answer format because what I will do below doesn't fit in a Comment.
The OP asked to see how the summation of an infinite series up to a desired degree of precision can be done in pure SQL. I will give two examples.
First example:
An infinite series with positive terms: e = sum [ j = 0 to infinity ] ( 1 / factorial(j) ). A trivial upper bound for the error for the sum from 0 to n is the last term added to the series. The recursive query below (which requires Oracle 11.1 or above - actually 11.2 the way I wrote it, with column names in the declaration, but it can be easily changed for 11.1) computes the value of e precise to 38 decimal places (the maximum precision available in Oracle). The inverse-factorial series for e converges very quickly; this only takes 35 steps and runs in less than 0.001 seconds on my old, home-class computer (which is just a big Dell tablet with a keyboard).
Edit: Duh! Only I can post something where e = 3.71828! Even though in the recursive query I add ALL the terms (including 1/0!), I started the sum from 1 instead of 0. (Corrected now, but had this mistake before the correction.)
with
rec ( j, s, next_term, err ) as (
select 0, 0, 1, 2
from dual
union all
select j+1, s + next_term, next_term/(j+1), next_term
from rec
where err > power(10, -38) and j < 1000000000
)
select max(j) as steps, round(max(s), 38) as e
from rec
;
STEPS E
----- ----------------------------------------
35 2.71828182845904523536028747135266249776
Second example:
OK, so now let's take an alternating series (where the absolute value of the last term is always an upper bound for the error), and let's take a very slow-converging one:
ln( 2 ) = sum [ j = 1 to infinity ] ( (-1)^(j - 1) / j )
The query below computes ln(2), accurate to five decimal places; here we know beforehand we need 100,000 steps, and the computation on my machine took about 1.1 seconds. (Remember though, this is a VERY slowly converging series.)
with
rec ( j, s, sgn ) as (
select 0, 0, 1
from dual
union all
select j+1, s + sgn / (j+1), -sgn
from rec
where j <= 100000
)
select 100000 as steps, round(s, 5) as ln_2
from rec
where j = 100000
;
STEPS LN_2
------ -------
100000 0.69314
To summarise my findings based on the suggestions in the above comments:
Eliminating the casting has had the largest effect of the suggested refinements. Doing everything in binary_double
makes the code run around 3.5x faster, a hair over ten seconds. Thence applying simple_double
and pragma_inline
reduce this to around 9.8s on average.
The current version of my code, with the above changes applied is as follows. The PoissonDist
function could be optimized further, but the original question was more about how to make existing code run faster, rather than the optimization of the code per se.
EDIT: The code has been revised to reflect the correct use of pragma_inline
, which has reduced the execution time to around 9.5s.
create or replace FUNCTION gammln(
x IN simple_double)
RETURN simple_double
IS
--Lanczos' approximation to the Log Gamma function
ser simple_double := 1.000000000190015d;
tmp simple_double := x + 5.5d;
BEGIN
tmp :=(x + 0.5d) * Ln(tmp) - tmp;
ser := ser + 76.18009172947146d /(x + 1.0d) ;
ser := ser - 86.50532032941677d /(x + 2.0d) ;
ser := ser + 24.01409824083091d /(x + 3.0d) ;
ser := ser - 1.231739572450155d /(x + 4.0d) ;
ser := ser + 1.208650973866179E-03d /(x + 5.0d) ;
ser := ser - 5.395239384953E-06d /(x + 6.0d) ;
RETURN tmp + Ln(2.5066282746310005d * ser / x) ;
END;
/
create or replace FUNCTION PoissonDist(
k IN simple_double,
lambda IN simple_double)
RETURN simple_double
IS
BEGIN
PRAGMA INLINE (gammln, 'YES');
RETURN Exp((k * Ln(lambda)) - lambda - gammln(k + 1d)) ;
END;
/
CREATE OR REPLACE FUNCTION EBO(
stock IN simple_double,
pipeline IN simple_double,
accuracy IN simple_double DEFAULT 0.0000000001)
RETURN simple_double
IS
i simple_double := 1d;
EBO simple_double := 0d;
term simple_double := 0d;
temp simple_double := 0d;
PRAGMA INLINE(PoissonDist, 'YES') ;
PoissonVal simple_double := PoissonDist(stock + 1d, pipeline) ;
peaked BOOLEAN := false; --Flag the Poisson curve as having peaked
BEGIN
IF(pipeline = 0.0d) THEN
RETURN EBO;
END IF;
IF(PoissonVal < accuracy AND floor(pipeline) > stock) THEN --If p() is very
-- small...
i := floor(pipeline) - stock; --Revise i to just below peak of Poisson curve
PRAGMA INLINE(PoissonDist, 'YES') ;
PoissonVal := PoissonDist(stock + i, pipeline) ; --Get p() value close to
-- peak
temp := PoissonVal *(pipeline /(stock + i + 1d)) ; --
-- Store poisson value just above peak
LOOP
term := i * PoissonVal;
EBO := EBO + term;
i := i - 1d; --Work backwards
PoissonVal := PoissonVal *((stock + i + 1) / pipeline) ; --Revise Poisson
-- value for next time
EXIT
WHEN(term < accuracy OR i = 0d) ;
END LOOP;
i := 1d + floor(pipeline) - stock;
PoissonVal := temp;
peaked := true;
END IF;
LOOP
term := i * PoissonVal;
EBO := EBO + term;
i := i + 1d;
PoissonVal := PoissonVal *(pipeline /(stock + i)) ; --Revise Poisson value
-- for next time
IF((stock + i) > pipeline) THEN
peaked := true;
END IF;
EXIT
WHEN(term < accuracy AND peaked) ;
END LOOP;
IF(EBO < accuracy) THEN
EBO := 0.0d;
END IF;
RETURN EBO;
END;
/
create or replace FUNCTION timeebo
RETURN binary_double
IS
i binary_double;
EBOVal binary_double;
acc binary_double;
BEGIN
acc := 0.0d;
FOR i IN 1d..1000000d
LOOP
PRAGMA INLINE (EBO, 'YES');
EBOVal := EBO(500d, i / 1000d) ;
acc := acc + EBOVal;
END LOOP;
RETURN acc;
END;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With