I have a dataset (data frame) with 5 columns all containing numeric values.
I'm looking to run a simple linear regression for each pair in the dataset.
For example, If the columns were named A, B, C, D, E
, I want to run lm(A~B), lm(A~C), lm(A~D), ...., lm(D~E)
,... and, then I want to plot the data for each pair along with the regression line.
I'm pretty new to R so I'm sort of spinning my wheels on how to actually accomplish this. Should I use ddply
? or lapply
? I'm not really sure how to tackle this.
Here's one solution using combn
combn(names(DF), 2, function(x){lm(DF[, x])}, simplify = FALSE)
Example:
set.seed(1)
DF <- data.frame(A=rnorm(50, 100, 3),
B=rnorm(50, 100, 3),
C=rnorm(50, 100, 3),
D=rnorm(50, 100, 3),
E=rnorm(50, 100, 3))
Updated: adding @Henrik suggestion (see comments)
# only the coefficients
> results <- combn(names(DF), 2, function(x){coefficients(lm(DF[, x]))}, simplify = FALSE)
> vars <- combn(names(DF), 2)
> names(results) <- vars[1 , ] # adding names to identify variables in the reggression
> results
$A
(Intercept) B
103.66739418 -0.03354243
$A
(Intercept) C
97.88341555 0.02429041
$A
(Intercept) D
122.7606103 -0.2240759
$A
(Intercept) E
99.26387487 0.01038445
$B
(Intercept) C
99.971253525 0.003824755
$B
(Intercept) D
102.65399702 -0.02296721
$B
(Intercept) E
96.83042199 0.03524868
$C
(Intercept) D
80.1872211 0.1931079
$C
(Intercept) E
89.0503893 0.1050202
$D
(Intercept) E
107.84384655 -0.07620397
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With