Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract Group Regression Coefficients in R w/ PLYR

Tags:

r

plyr

I'm trying to run a regression for every zipcode in my dataset and save the coefficients to a data frame but I'm having trouble.

Whenever I run the code below, I get a data frame called "coefficients" containing every zip code but with the intercept and coefficient for every zipcode being equal to the results of the simple regression lm(Sealed$hhincome ~ Sealed$square_footage).

When I run the code as indicated in Ranmath's example at the link below, everything works as expected. I'm new to R after many years with STATA, so any help would be greatly appreciated :)

R extract regression coefficients from multiply regression via lapply command

library(plyr)
Sealed <- read.csv("~/Desktop/SEALED.csv")

x <- function(df) {
      lm(Sealed$hhincome ~ Sealed$square_footage)
}

regressions <- dlply(Sealed, .(Sealed$zipcode), x)
coefficients <- ldply(regressions, coef)
like image 275
Patrick Avatar asked Nov 30 '25 02:11

Patrick


2 Answers

Because dlply takes a ... argument that allows additional arguments to be passed to the function, you can make things even simpler:

dlply(Sealed,.(zipcode),lm,formula=hhincome~square_footage)

The first two arguments to lm are formula and data. Since formula is specified here, lm will pick up the next argument it is given (the relevant zipcode-specific chunk of Sealed) as the data argument ...

like image 61
Ben Bolker Avatar answered Dec 02 '25 16:12

Ben Bolker


You are applying the function:

x <- function(df) {
      lm(Sealed$hhincome ~ Sealed$square_footage)
}

to each subset of your data, so we shouldn't be surprised that the output each time is exactly

lm(Sealed$hhincome ~ Sealed$square_footage)

right? Try replacing Sealed with df inside your function. That way you're referring to the variables in each individual piece passed to the function, not the whole variable in the data frame Sealed.

like image 45
joran Avatar answered Dec 02 '25 18:12

joran