Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Weighted average in T-SQL (like Excel's SUMPRODUCT)

I am looking for a way to derive a weighted average from two rows of data with the same number of columns, where the average is as follows (borrowing Excel notation):

(A1*B1)+(A2*B2)+...+(An*Bn)/SUM(A1:An)

The first part reflects the same functionality as Excel's SUMPRODUCT() function.

My catch is that I need to dynamically specify which row gets averaged with weights, and which row the weights come from, and a date range.

EDIT: This is easier than I thought, because Excel was making me think I required some kind of pivot. My solution so far is thus:

select sum(baseSeries.Actual * weightSeries.Actual) / sum(weightSeries.Actual)
from (
    select RecordDate , Actual 
    from CalcProductionRecords 
    where KPI = 'Weighty'
) baseSeries inner join (       
    select RecordDate , Actual 
    from CalcProductionRecords 
    where KPI = 'Tons Milled'   
) weightSeries on baseSeries.RecordDate = weightSeries.RecordDate
like image 627
ProfK Avatar asked Nov 08 '09 23:11

ProfK


People also ask

Is weighted average the same as SUMPRODUCT?

SUMPRODUCT is essentially the Sum of Test 1 times its weight, plus the Mid-term times its weight, and so on. To get the Weighted Average, you divide by the Total of the weights. If we had just averaged the Test scores, the value would be 75.5, a significant difference.

Can you do a SUMPRODUCT in SQL?

Use the aggregate function SUMPRODUCT to perform elementwise multiplication on the supplied matrices and return the sum of those products. The string representation of a matrix with columns separated by commas and rows separated by semi-colons.

How do you calculate weighted average in SQL?

The weighted average function seq_wavg() calculates the average as the sum of the products of the two sequence elements divided by the sum of sequence 1.

How do you do SUMPRODUCT with multiple criteria?

How to use SUMPRODUCT with Multiple Criteria in Excel? We can use it in place of formulas like SUMIF. The criteria can include dates, numbers, and text. For example, the formula “=SUMIF(B1:B5, “<=12”)” adds the values in the cell range B1:B5, which are less than or equal to 12.


2 Answers

Quassnoi's answer shows how to do the SumProduct, and using a WHERE clause would allow you to restrict by a Date field...

SELECT
   SUM([tbl].data * [tbl].weight) / SUM([tbl].weight)
FROM
   [tbl]
WHERE
   [tbl].date >= '2009 Jan 01'
   AND [tbl].date < '2010 Jan 01'

The more complex part is where you want to "dynamically specify" the what field is [data] and what field is [weight]. The short answer is that realistically you'd have to make use of Dynamic SQL. Something along the lines of:
- Create a string template
- Replace all instances of [tbl].data with the appropriate data field
- Replace all instances of [tbl].weight with the appropriate weight field
- Execute the string

Dynamic SQL, however, carries it's own overhead. Is the queries are relatively infrequent , or the execution time of the query itself is relatively long, this may not matter. If they are common and short, however, you may notice that using dynamic sql introduces a noticable overhead. (Not to mention being careful of SQL injection attacks, etc.)

EDIT:

In your lastest example you highlight three fields:

  • RecordDate
  • KPI
  • Actual

When the [KPI] is "Weight Y", then [Actual] the Weighting Factor to use.
When the [KPI] is "Tons Milled", then [Actual] is the Data you want to aggregate.


Some questions I have are:

  • Are there any other fields?
  • Is there only ever ONE actual per date per KPI?

The reason I ask being that you want to ensure the JOIN you do is only ever 1:1. (You don't want 5 Actuals joining with 5 Weights, giving 25 resultsing records)

Regardless, a slight simplification of your query is certainly possible...

SELECT
   SUM([baseSeries].Actual * [weightSeries].Actual) / SUM([weightSeries].Actual)
FROM
   CalcProductionRecords AS [baseSeries]
INNER JOIN
   CalcProductionRecords AS [weightSeries]
      ON [weightSeries].RecordDate = [baseSeries].RecordDate
--    AND [weightSeries].someOtherID = [baseSeries].someOtherID
WHERE
   [baseSeries].KPI = 'Tons Milled'
   AND [weightSeries].KPI = 'Weighty'

The commented out line only needed if you need additional predicates to ensure a 1:1 relationship between your data and the weights.


If you can't guarnatee just One value per date, and don't have any other fields to join on, you can modify your sub_query based version slightly...

SELECT
   SUM([baseSeries].Actual * [weightSeries].Actual) / SUM([weightSeries].Actual)
FROM
(
    SELECT
        RecordDate,
        SUM(Actual)
    FROM
        CalcProductionRecords
    WHERE
        KPI = 'Tons Milled'
    GROUP BY
        RecordDate
)
   AS [baseSeries]
INNER JOIN
(
    SELECT
        RecordDate,
        AVG(Actual)
    FROM
        CalcProductionRecords
    WHERE
        KPI = 'Weighty'
    GROUP BY
        RecordDate
)
   AS [weightSeries]
      ON [weightSeries].RecordDate = [baseSeries].RecordDate

This assumes the AVG of the weight is valid if there are multiple weights for the same day.


EDIT : Someone just voted for this so I thought I'd improve the final answer :)

SELECT
   SUM(Actual * Weight) / SUM(Weight)
FROM
(
    SELECT
        RecordDate,
        SUM(CASE WHEN KPI = 'Tons Milled' THEN Actual ELSE NULL END)   AS Actual,
        AVG(CASE WHEN KPI = 'Weighty'     THEN Actual ELSE NULL END)   AS Weight
    FROM
        CalcProductionRecords
    WHERE
        KPI IN ('Tons Milled', 'Weighty')
    GROUP BY
        RecordDate
)
   AS pivotAggregate

This avoids the JOIN and also only scans the table once.

It relies on the fact that NULL values are ignored when calculating the AVG().

like image 148
MatBailie Avatar answered Sep 24 '22 06:09

MatBailie


SELECT  SUM(A * B) / SUM(A)
FROM    mytable
like image 31
Quassnoi Avatar answered Sep 24 '22 06:09

Quassnoi