Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Running multiple, simple linear regressions from dataframe in R

I have a dataset (data frame) with 5 columns all containing numeric values.

I'm looking to run a simple linear regression for each pair in the dataset.

For example, If the columns were named A, B, C, D, E, I want to run lm(A~B), lm(A~C), lm(A~D), ...., lm(D~E),... and, then I want to plot the data for each pair along with the regression line.

I'm pretty new to R so I'm sort of spinning my wheels on how to actually accomplish this. Should I use ddply? or lapply? I'm not really sure how to tackle this.

like image 613
mrp Avatar asked Sep 22 '13 18:09

mrp


1 Answers

Here's one solution using combn

 combn(names(DF), 2, function(x){lm(DF[, x])}, simplify = FALSE)

Example:

set.seed(1)
DF <- data.frame(A=rnorm(50, 100, 3),
                 B=rnorm(50, 100, 3),
                 C=rnorm(50, 100, 3),
                 D=rnorm(50, 100, 3),
                 E=rnorm(50, 100, 3))

Updated: adding @Henrik suggestion (see comments)

# only the coefficients
> results <- combn(names(DF), 2, function(x){coefficients(lm(DF[, x]))}, simplify = FALSE)
> vars <- combn(names(DF), 2)
> names(results) <- vars[1 , ] # adding names to identify variables in the reggression
> results
$A
 (Intercept)            B 
103.66739418  -0.03354243 

$A
(Intercept)           C 
97.88341555  0.02429041 

$A
(Intercept)           D 
122.7606103  -0.2240759 

$A
(Intercept)           E 
99.26387487  0.01038445 

$B
 (Intercept)            C 
99.971253525  0.003824755 

$B
 (Intercept)            D 
102.65399702  -0.02296721 

$B
(Intercept)           E 
96.83042199  0.03524868 

$C
(Intercept)           D 
 80.1872211   0.1931079 

$C
(Intercept)           E 
 89.0503893   0.1050202 

$D
 (Intercept)            E 
107.84384655  -0.07620397 
like image 96
Jilber Urbina Avatar answered Sep 22 '22 18:09

Jilber Urbina