I am looking for a way to derive a weighted average from two rows of data with the same number of columns, where the average is as follows (borrowing Excel notation):
(A1*B1)+(A2*B2)+...+(An*Bn)/SUM(A1:An)
The first part reflects the same functionality as Excel's SUMPRODUCT() function.
My catch is that I need to dynamically specify which row gets averaged with weights, and which row the weights come from, and a date range.
EDIT: This is easier than I thought, because Excel was making me think I required some kind of pivot. My solution so far is thus:
select sum(baseSeries.Actual * weightSeries.Actual) / sum(weightSeries.Actual)
from (
select RecordDate , Actual
from CalcProductionRecords
where KPI = 'Weighty'
) baseSeries inner join (
select RecordDate , Actual
from CalcProductionRecords
where KPI = 'Tons Milled'
) weightSeries on baseSeries.RecordDate = weightSeries.RecordDate
SUMPRODUCT is essentially the Sum of Test 1 times its weight, plus the Mid-term times its weight, and so on. To get the Weighted Average, you divide by the Total of the weights. If we had just averaged the Test scores, the value would be 75.5, a significant difference.
Use the aggregate function SUMPRODUCT to perform elementwise multiplication on the supplied matrices and return the sum of those products. The string representation of a matrix with columns separated by commas and rows separated by semi-colons.
The weighted average function seq_wavg() calculates the average as the sum of the products of the two sequence elements divided by the sum of sequence 1.
How to use SUMPRODUCT with Multiple Criteria in Excel? We can use it in place of formulas like SUMIF. The criteria can include dates, numbers, and text. For example, the formula “=SUMIF(B1:B5, “<=12”)” adds the values in the cell range B1:B5, which are less than or equal to 12.
Quassnoi's answer shows how to do the SumProduct, and using a WHERE clause would allow you to restrict by a Date field...
SELECT
SUM([tbl].data * [tbl].weight) / SUM([tbl].weight)
FROM
[tbl]
WHERE
[tbl].date >= '2009 Jan 01'
AND [tbl].date < '2010 Jan 01'
The more complex part is where you want to "dynamically specify" the what field is [data] and what field is [weight]. The short answer is that realistically you'd have to make use of Dynamic SQL. Something along the lines of:
- Create a string template
- Replace all instances of [tbl].data with the appropriate data field
- Replace all instances of [tbl].weight with the appropriate weight field
- Execute the string
Dynamic SQL, however, carries it's own overhead. Is the queries are relatively infrequent , or the execution time of the query itself is relatively long, this may not matter. If they are common and short, however, you may notice that using dynamic sql introduces a noticable overhead. (Not to mention being careful of SQL injection attacks, etc.)
EDIT:
In your lastest example you highlight three fields:
When the [KPI] is "Weight Y", then [Actual] the Weighting Factor to use.
When the [KPI] is "Tons Milled", then [Actual] is the Data you want to aggregate.
Some questions I have are:
The reason I ask being that you want to ensure the JOIN you do is only ever 1:1. (You don't want 5 Actuals joining with 5 Weights, giving 25 resultsing records)
Regardless, a slight simplification of your query is certainly possible...
SELECT
SUM([baseSeries].Actual * [weightSeries].Actual) / SUM([weightSeries].Actual)
FROM
CalcProductionRecords AS [baseSeries]
INNER JOIN
CalcProductionRecords AS [weightSeries]
ON [weightSeries].RecordDate = [baseSeries].RecordDate
-- AND [weightSeries].someOtherID = [baseSeries].someOtherID
WHERE
[baseSeries].KPI = 'Tons Milled'
AND [weightSeries].KPI = 'Weighty'
The commented out line only needed if you need additional predicates to ensure a 1:1 relationship between your data and the weights.
If you can't guarnatee just One value per date, and don't have any other fields to join on, you can modify your sub_query based version slightly...
SELECT
SUM([baseSeries].Actual * [weightSeries].Actual) / SUM([weightSeries].Actual)
FROM
(
SELECT
RecordDate,
SUM(Actual)
FROM
CalcProductionRecords
WHERE
KPI = 'Tons Milled'
GROUP BY
RecordDate
)
AS [baseSeries]
INNER JOIN
(
SELECT
RecordDate,
AVG(Actual)
FROM
CalcProductionRecords
WHERE
KPI = 'Weighty'
GROUP BY
RecordDate
)
AS [weightSeries]
ON [weightSeries].RecordDate = [baseSeries].RecordDate
This assumes the AVG of the weight is valid if there are multiple weights for the same day.
EDIT : Someone just voted for this so I thought I'd improve the final answer :)
SELECT
SUM(Actual * Weight) / SUM(Weight)
FROM
(
SELECT
RecordDate,
SUM(CASE WHEN KPI = 'Tons Milled' THEN Actual ELSE NULL END) AS Actual,
AVG(CASE WHEN KPI = 'Weighty' THEN Actual ELSE NULL END) AS Weight
FROM
CalcProductionRecords
WHERE
KPI IN ('Tons Milled', 'Weighty')
GROUP BY
RecordDate
)
AS pivotAggregate
This avoids the JOIN and also only scans the table once.
It relies on the fact that NULL
values are ignored when calculating the AVG()
.
SELECT SUM(A * B) / SUM(A)
FROM mytable
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With