Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

adding custom function to summarise in dplyr

Tags:

r

dplyr

I have a data frame like this, with different observations for each id:

library(dplyr)
df <- data.frame(id=c(1,1,1,1,1,2,2,3), v1= rnorm(8), v2=rnorm(8))

I then group by id:

by_id <- group_by(df, id)

Now I want to calculate mean and sd of the observations of v1 for each id. This is easy with summarise:

df2 <- summarise(by_id,
                    v1.mean=mean(v1),
                    v1.sd=sd(v1))

Now I want to add the slope of a linear regression of v1 and v2

df2 <- summarise(by_id,
                   v1.mean=mean(v1),
                   v1.sd=sd(v1),
                   slope=as.vector(coef(lm(v1~v2,na.action="na.omit")[2])))

However, this fails, I think because one person (id=3) has only one observation and thus cannot build a linear model.

I also tried

   slope=ifelse(n()==1,0,as.vector(coef(lm(v1~v2,na.action="na.omit")[2]))))

but it does not work either. Is there an easy solution for this?

Not that it may also be the case that if I have more than one observation but for example v2 has a missing value, so the lm might also fail.

like image 672
spore234 Avatar asked Aug 05 '15 09:08

spore234


1 Answers

you can try this

group_by(df, id) %>% do(fit = lm(v1~v2, .)) %>% summarise(intercept = coef(fit)[1],  slope= coef(fit)[2])
Source: local data frame [3 x 2]

   intercept     slope
1 -0.3116880 0.2698022
2 -1.2303663 0.4949600
3  0.3169372        NA

note the use of do and . inside the lm function.

like image 148
Mamoun Benghezal Avatar answered Oct 15 '22 02:10

Mamoun Benghezal