I am trying to perform cumulative multiplication. I am trying two methods to do this
DECLARE @TEST TABLE
(
PAR_COLUMN INT,
PERIOD INT,
VALUE NUMERIC(22, 6)
)
INSERT INTO @TEST VALUES
(1,601,10 ),
(1,602,20 ),
(1,603,30 ),
(1,604,40 ),
(1,605,50 ),
(1,606,60 ),
(2,601,100),
(2,602,200),
(2,603,300),
(2,604,400),
(2,605,500),
(2,606,600)
Note: The data in value
column will never be integer and values will have decimal part. To show approximation problem I have kept example values as integers.
In this method am using EXP + LOG + SUM() Over(Order by)
technique to find cumulative multiplication. In this method values are not accurate; there is some rounding and approximation issue in the result.
SELECT *,
Exp(Sum(Log(Abs(NULLIF(VALUE, 0))))
OVER(
PARTITION BY PAR_COLUMN
ORDER BY PERIOD)) AS CUM_MUL
FROM @TEST;
PAR_COLUMN PERIOD VALUE CUM_MUL
---------- ------ --------- ----------------
1 601 10.000000 10
1 602 20.000000 200 -- 10 * 20 = 200(correct)
1 603 30.000000 6000.00000000001 -- 200 * 30 = 6000.000000000 (not 6000.00000000001) incorrect
1 604 40.000000 240000
1 605 50.000000 12000000
1 606 60.000000 720000000.000001 -- 12000000 * 60 = 720000000.000000 (not 720000000.000001) incorrect
2 601 100.000000 100
2 602 200.000000 20000
2 603 300.000000 5999999.99999999 -- 20000.000000 *300.000000 = 6000000.000000 (not 5999999.99999999) incorrect
2 604 400.000000 2399999999.99999
2 605 500.000000 1199999999999.99
2 606 600.000000 719999999999998
This method works perfectly without any rounding or approximation problem.
;WITH CTE
AS (SELECT TOP 1 WITH TIES PAR_COLUMN,
PERIOD,
VALUE,
CUM_MUL = VALUE
FROM @TEST
ORDER BY PERIOD
UNION ALL
SELECT T.PAR_COLUMN,
T.PERIOD,
T.VALUE,
Cast(T.VALUE * C.CUM_MUL AS NUMERIC(22, 6))
FROM CTE C
INNER JOIN @TEST T
ON C.PAR_COLUMN = T.PAR_COLUMN
AND T.PERIOD = C.PERIOD + 1)
SELECT *
FROM CTE
ORDER BY PAR_COLUMN,PERIOD
PAR_COLUMN PERIOD VALUE CUM_MUL
---------- ------ --------- ----------------
1 601 10.000000 10.000000
1 602 20.000000 200.000000
1 603 30.000000 6000.000000
1 604 40.000000 240000.000000
1 605 50.000000 12000000.000000
1 606 60.000000 720000000.000000
2 601 100.000000 100.000000
2 602 200.000000 20000.000000
2 603 300.000000 6000000.000000
2 604 400.000000 2400000000.000000
2 605 500.000000 1200000000000.000000
2 606 600.000000 720000000000000.000000
Can anyone tell me why in method 1 values are not accurate and how to fix it? I tried by changing the data types to Float
and by increasing the scale
in numeric
but no use.
I really want to use method 1 which is much faster than method 2.
Edit: Now I know the reason for approximation. Can anyone find a fix for this problem?
In pure T-SQL LOG
and EXP
operate with the float
type (8 bytes), which has only 15-17 significant digits. Even that last 15th digit can become inaccurate if you sum large enough values. Your data is numeric(22,6)
, so 15 significant digits is not enough.
POWER
can return numeric
type with potentially higher precision, but it is of little use for us, because both LOG
and LOG10
can return only float
anyway.
To demonstrate the problem I'll change the type in your example to numeric(15,0)
and use POWER
instead of EXP
:
DECLARE @TEST TABLE
(
PAR_COLUMN INT,
PERIOD INT,
VALUE NUMERIC(15, 0)
);
INSERT INTO @TEST VALUES
(1,601,10 ),
(1,602,20 ),
(1,603,30 ),
(1,604,40 ),
(1,605,50 ),
(1,606,60 ),
(2,601,100),
(2,602,200),
(2,603,300),
(2,604,400),
(2,605,500),
(2,606,600);
SELECT *,
POWER(CAST(10 AS numeric(15,0)),
Sum(LOG10(
Abs(NULLIF(VALUE, 0))
))
OVER(PARTITION BY PAR_COLUMN ORDER BY PERIOD)) AS Mul
FROM @TEST;
Result
+------------+--------+-------+-----------------+
| PAR_COLUMN | PERIOD | VALUE | Mul |
+------------+--------+-------+-----------------+
| 1 | 601 | 10 | 10 |
| 1 | 602 | 20 | 200 |
| 1 | 603 | 30 | 6000 |
| 1 | 604 | 40 | 240000 |
| 1 | 605 | 50 | 12000000 |
| 1 | 606 | 60 | 720000000 |
| 2 | 601 | 100 | 100 |
| 2 | 602 | 200 | 20000 |
| 2 | 603 | 300 | 6000000 |
| 2 | 604 | 400 | 2400000000 |
| 2 | 605 | 500 | 1200000000000 |
| 2 | 606 | 600 | 720000000000001 |
+------------+--------+-------+-----------------+
Each step here looses precision. Calculating LOG looses precision, SUM looses precision, EXP/POWER looses precision. With these built-in functions I don't think you can do much about it.
So, the answer is - use CLR with C# decimal
type (not double
), which supports higher precision (28-29 significant digits). Your original SQL type numeric(22,6)
would fit into it. And you wouldn't need the trick with LOG/EXP
.
Oops. I tried to make a CLR aggregate that calculates Product. It works in my tests, but only as a simple aggregate, i.e.
This works:
SELECT T.PAR_COLUMN, [dbo].[Product](T.VALUE) AS P
FROM @TEST AS T
GROUP BY T.PAR_COLUMN;
And even OVER (PARTITION BY)
works:
SELECT *,
[dbo].[Product](T.VALUE)
OVER (PARTITION BY PAR_COLUMN) AS P
FROM @TEST AS T;
But, running product using OVER (PARTITION BY ... ORDER BY ...)
doesn't work (checked with SQL Server 2014 Express 12.0.2000.8):
SELECT *,
[dbo].[Product](T.VALUE)
OVER (PARTITION BY T.PAR_COLUMN ORDER BY T.PERIOD
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS CUM_MUL
FROM @TEST AS T;
Incorrect syntax near the keyword 'ORDER'.
A search found this connect item, which is closed as "Won't Fix" and this question.
The C# code:
using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.IO;
using System.Collections.Generic;
using System.Text;
namespace RunningProduct
{
[Serializable]
[SqlUserDefinedAggregate(
Format.UserDefined,
MaxByteSize = 17,
IsInvariantToNulls = true,
IsInvariantToDuplicates = false,
IsInvariantToOrder = true,
IsNullIfEmpty = true)]
public struct Product : IBinarySerialize
{
private bool m_bIsNull; // 1 byte storage
private decimal m_Product; // 16 bytes storage
public void Init()
{
this.m_bIsNull = true;
this.m_Product = 1;
}
public void Accumulate(
[SqlFacet(Precision = 22, Scale = 6)] SqlDecimal ParamValue)
{
if (ParamValue.IsNull) return;
this.m_bIsNull = false;
this.m_Product *= ParamValue.Value;
}
public void Merge(Product other)
{
SqlDecimal otherValue = other.Terminate();
this.Accumulate(otherValue);
}
[return: SqlFacet(Precision = 22, Scale = 6)]
public SqlDecimal Terminate()
{
if (m_bIsNull)
{
return SqlDecimal.Null;
}
else
{
return m_Product;
}
}
public void Read(BinaryReader r)
{
this.m_bIsNull = r.ReadBoolean();
this.m_Product = r.ReadDecimal();
}
public void Write(BinaryWriter w)
{
w.Write(this.m_bIsNull);
w.Write(this.m_Product);
}
}
}
Install CLR assembly:
-- Turn advanced options on
EXEC sys.sp_configure @configname = 'show advanced options', @configvalue = 1 ;
GO
RECONFIGURE WITH OVERRIDE ;
GO
-- Enable CLR
EXEC sys.sp_configure @configname = 'clr enabled', @configvalue = 1 ;
GO
RECONFIGURE WITH OVERRIDE ;
GO
CREATE ASSEMBLY [RunningProduct]
AUTHORIZATION [dbo]
FROM 'C:\RunningProduct\RunningProduct.dll'
WITH PERMISSION_SET = SAFE;
GO
CREATE AGGREGATE [dbo].[Product](@ParamValue numeric(22,6))
RETURNS numeric(22,6)
EXTERNAL NAME [RunningProduct].[RunningProduct.Product];
GO
This question discusses calculation of a running SUM in great details and Paul White shows in his answer how to write a CLR function that calculates running SUM efficiently. It would be a good start for writing a function that calculates running Product.
Note, that he uses a different approach. Instead of making a custom aggregate function, Paul makes a function that returns a table. The function reads the original data into memory and performs all required calculations.
It may be easier to achieve the desired effect by implementing these calculations on your client side using the programming language of your choice. Just read the whole table and calculate running product on the client. Creating CLR function makes sense if the running product calculated on the server is an intermediary step in a more complex calculations that would aggregate data further.
One more idea that comes to mind.
Find a third-party .NET math library that offers Log
and Exp
functions with high precision. Make a CLR version of these scalar functions. And then use the EXP + LOG + SUM() Over (Order by)
approach, where SUM
is the built-in T-SQL function, which supports Over (Order by)
and Exp
and Log
are custom CLR functions that return not float
, but high-precision decimal
.
Note, that high precision calculations may also be slow. And using CLR scalar functions in the query may also make it slow.
LOG()
and EXP()
implicitly convert arguments to the float
data type, which are approximate values.
You can round to big multiple, for your data:
--720000000000000 must be multiple of 600
select
round( 719999999999998/600, 0 ) * 600
--result: 720000000000000
Test it at SQLFiddle
create TABLE T
(
PAR_COLUMN INT,
PERIOD INT,
VALUE NUMERIC(22, 6)
)
INSERT INTO T VALUES
(1,601,10.1 ), --<--- I put decimals just to test!
(1,602,20 ),
(1,603,30 ),
(1,604,40 ),
(1,605,50 ),
(1,606,60 ),
(2,601,100),
(2,602,200),
(2,603,300),
(2,604,400),
(2,605,500),
(2,606,600)
Query 1:
with T1 as (
SELECT *,
Exp(Sum(Log(Abs(NULLIF(VALUE, 0))))
OVER(
PARTITION BY PAR_COLUMN
ORDER BY PERIOD)) AS CUM_MUL,
VALUE AS CUM_MAX1,
LAG( VALUE , 1, 1.)
OVER(
PARTITION BY PAR_COLUMN
ORDER BY PERIOD ) AS CUM_MAX2,
LAG( VALUE , 2, 1.)
OVER(
PARTITION BY PAR_COLUMN
ORDER BY PERIOD ) AS CUM_MAX3
FROM T )
select PAR_COLUMN, PERIOD, VALUE,
( round( ( CUM_MUL / ( CUM_MAX1 * CUM_MAX2 * CUM_MAX3) ) ,6)
*
cast( ( 1000000 * CUM_MAX1 * CUM_MAX2 * CUM_MAX3) as bigint )
) / 1000000.
as CUM_MUL
FROM T1
Results:
| PAR_COLUMN | PERIOD | VALUE | CUM_MUL |
|------------|--------|-------|-----------------|
| 1 | 601 | 10.1 | 10.1 | --ok! because my data
| 1 | 602 | 20 | 202 |
| 1 | 603 | 30 | 6060 |
| 1 | 604 | 40 | 242400 |
| 1 | 605 | 50 | 12120000 |
| 1 | 606 | 60 | 727200000 |
| 2 | 601 | 100 | 100 |
| 2 | 602 | 200 | 20000 |
| 2 | 603 | 300 | 6000000 |
| 2 | 604 | 400 | 2400000000 |
| 2 | 605 | 500 | 1200000000000 |
| 2 | 606 | 600 | 720000000000000 |
Notice I x1000000 to work without decimals
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With