Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Weighted trendline

Excel produces scatter diagrams for sets of pair values. It also gives the option of producing a best fit trendline and formula for the trendline. It also produces bubble diagrams which take into consideration a weight provided with each value. However, the weight has no influence on the trendline or formula. Here is an example set of values, with their mappings and weights.

    Value Map     Weight
    0       1      10
    1       2      10
    2       5      10
    3       5      20
    4       6      20
    5       1      1

With Excel's trendline, the mapping for value 5 has too much influence on the formula. Is there any way to produce a formula that reflects the respective weights?

As a help, I've introduced a weighted average for five consecutive values. But is their a better way?

like image 959
Tams Avatar asked Jun 18 '12 17:06

Tams


2 Answers

With data in A2:C7, based on the standard weighted least squares formula, you can try:

=LINEST(B2:B7*C2:C7^0.5,IF({1,0},1,A2:A7)*C2:C7^0.5,0)

entered with CTRL+SHIFT+ENTER in E2:F2 or any 2x1 range. This also returns {1.1353,1.4412}.

For Rsquared you can enter:

=INDEX(LINEST((B2:B7-SUM(B2:B7*C2:C7)/SUM(C2:C7))*C2:C7^0.5,IF({1,0},1,A2:A7)*C2:C7^0.5,0,1),3,1)

Explanation of formula

Consider first a normal regression of y on X using LINEST. If const = TRUE the regressor matrix is the augmented matrix consisting of a column of ones followed by the regressor columns i.e. X'=(1,X). If const = FALSE the regressor matrix is simply X so running the regression with a column of ones included gives the same estimates as running without a column of ones and setting const=TRUE.

Now consider a weighted least squares regression. The regression is now Wy on WX'=(W1,WX) where W is the diagonal matrix consisting of the square root of the weights. Since there is not a column of ones present, we must set const = FALSE and use two columns in the regressor matrix.

Rsquared Calculation

Setting stats to TRUE in the LINEST output of the first formula we get in the third and fifth rows:

SSres = 59.76
SSreg(u) = 1461.24
SSTot(u) = 1521
Rsq(u) = 1 - 59.76/1521 = 0.9607 

Note these values are uncentered versions (u) since the const=FALSE (refer to MS Help on LINEST for further info.) For the centered versions (c) we need to subtract the weighted average as below:

SSTot(c) =SUMPRODUCT(C2:C7*(B2:B7-SUM(B2:B7*C2:C7)/SUM(C2:C7))^2) = 244.93
Rsq(c) = 1 - 59.76/244.93 = 0.756
like image 182
lori_m Avatar answered Nov 13 '22 00:11

lori_m


Update
Based on the additional information that you have tens of thousands of rows, here is a VBA UDF that will do the job (including the r2)

As per the screenshot below it provides the same m , x and r2 values that my expanded data set did in my original answer

enter image description here

Public Function LinestWeighted(xRng As Range, yRng As Range, wRng As Range, bInt As Boolean, bStat As Boolean) As Variant
    Dim x As Variant
    Dim y As Variant
    Dim W As Variant
    Dim TotX As Variant
    Dim TotY As Variant
    Dim lngRow As Long
    Dim strDelim As String
    Dim strX As String
    Dim strY As String
    Dim NewSeries As Variant

    x = Application.Transpose(xRng)
    y = Application.Transpose(yRng)
    W = Application.Transpose(wRng)
    strDelim = ","

    If (UBound(x, 1) = UBound(y, 1)) And (UBound(x, 1) = UBound(W, 1)) Then
        For lngRow = 1 To UBound(W)
            strX = strX & Application.WorksheetFunction.Rept(x(lngRow) & strDelim, W(lngRow))
            strY = strY & Application.WorksheetFunction.Rept(y(lngRow) & strDelim, W(lngRow))
        Next lngRow
        TotX = Split(Left$(strX, Len(strX) - 1), strDelim)
        TotY = Split(Left$(strY, Len(strY) - 1), strDelim)
        ReDim NewSeries(1 To UBound(TotX) + 1, 1 To 2)
        For lngRow = 0 To UBound(TotX)
            NewSeries(lngRow + 1, 1) = CDbl(TotX(lngRow))
            NewSeries(lngRow + 1, 2) = CDbl(TotY(lngRow))
        Next
        With Application
            LinestWeighted = .WorksheetFunction.LinEst(.Index(.Transpose(NewSeries), 2), .Index(.Transpose(NewSeries), 1), bInt, bStat)
        End With
    Else
        LinestWeighted = "input ranges must be equal in length"
        Exit Function
    End If
End Function

Initial Answer

Just expand out your data series by your weighting factors

So rather than try to graph 6 pairs, use the ratio of your highest to lowest to repeat the points

i.e. graph

0       1     `10 times`  
1       2     `10 times`    
...
5       1     `once`    

enter image description here

like image 36
brettdj Avatar answered Nov 13 '22 02:11

brettdj