"weighted" regression in R

Tags:

I have created a script like the one below to do something I called as "weighted" regression:

library(plyr)

set.seed(100)

temp.df <- data.frame(uid=1:200,
                      bp=sample(x=c(100:200),size=200,replace=TRUE),
                      age=sample(x=c(30:65),size=200,replace=TRUE),
                      weight=sample(c(1:10),size=200,replace=TRUE),
                      stringsAsFactors=FALSE)

temp.df.expand <- ddply(temp.df,
                        c("uid"),
                        function(df) {
                          data.frame(bp=rep(df[,"bp"],df[,"weight"]),
                                     age=rep(df[,"age"],df[,"weight"]),
                                     stringsAsFactors=FALSE)})

temp.df.lm <- lm(bp~age,data=temp.df,weights=weight)
temp.df.expand.lm <- lm(bp~age,data=temp.df.expand)

You can see that in temp.df, each row has its weight, what I mean is that there is a total of 1178 sample but for rows with same bp and age, they are merge into 1 row and represented in the weight column.

I used the weight parameters in the lm function, then I cross check the result with another dataframe that the temp.df dataframe is "expanded". But I found the lm outputs different for the 2 dataframe.

Did I misinterpret the weight parameters in lm function, and can anyone let me know how to I run regression properly (i.e. without expanding the dataframe manually) for a dataset presented like temp.df? Thanks.

287

asked Apr 22 '12 14:04

lokheart

1 Answers

The problem here is that the degrees of freedom are not being properly added up to get the right Df and mean-sum-squares statistics. This will correct the problem:

temp.df.lm.aov <- anova(temp.df.lm)
temp.df.lm.aov$Df[length(temp.df.lm.aov$Df)] <- 
        sum(temp.df.lm$weights)-   
        sum(temp.df.lm.aov$Df[-length(temp.df.lm.aov$Df)]  ) -1
temp.df.lm.aov$`Mean Sq` <- temp.df.lm.aov$`Sum Sq`/temp.df.lm.aov$Df
temp.df.lm.aov$`F value`[1] <- temp.df.lm.aov$`Mean Sq`[1]/
                                        temp.df.lm.aov$`Mean Sq`[2]
temp.df.lm.aov$`Pr(>F)`[1] <- pf(temp.df.lm.aov$`F value`[1], 1, 
                                      temp.df.lm.aov$Df, lower.tail=FALSE)[2]
temp.df.lm.aov
Analysis of Variance Table

Response: bp
            Df Sum Sq Mean Sq F value   Pr(>F)   
age          1   8741  8740.5  10.628 0.001146 **
Residuals 1176 967146   822.4

Compare with:

> anova(temp.df.expand.lm)
Analysis of Variance Table

Response: bp
            Df Sum Sq Mean Sq F value   Pr(>F)   
age          1   8741  8740.5  10.628 0.001146 **
Residuals 1176 967146   822.4                    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

I am a bit surprised this has not come up more often on R-help. Either that or my search strategy development powers are weakening with old age.

answered Oct 10 '22 00:10

IRTFM

Related questions
                            
                                ADD a new column into an XTS object
                            
                                The meaning of Warning message: Removed 4 rows containing missing values (geom_path)
                            
                                Publish Rstudio Shiny App in intranet
                            
                                list.files - exclude folder
                            
                                Putting line number for R code with knitr
                            
                                geom_raster comes out "smeared" when saving to PDF
                            
                                How can I extract the names of all package authors from CRAN
                            
                                What is the difference between extraction via $ and @ in R?
                            
                                Add column to DataFrame in sparkR
                            
                                R bookdown - cover page and appendix
                            
                                How to generate random numbers faster in R?
                            
                                Evaluation Error when tidyverse is loaded after Hmisc
                            
                                How to Display or Print Contents of Environment in R
                            
                                Is it possible to write a table to a file in JSON format in R?
                            
                                Roll Your Own Linked List/Tree in R?
                            
                                How to install 2 different R versions on Debian?
                            
                                Access all function arguments in R
                            
                                sqlSave: Mapping dataframe timestamps to SQL Server timestamps
                            
                                How can I add alpha-numeric AND greek characters to geom_text() in ggplot?
                            
                                How can I overlay two dense scatter plots so that I can see the outlines of each in R or Matlab?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

"weighted" regression in R

Tags:

r

linear-regression

weighted

lokheart

People also ask

1 Answers

IRTFM

Recent Activity

Donate For Us