I am using the following code ( which was developed in a previous post) for the following task: performing all possible linear regressions between the first variable and the other variables and saving the results in a new data frame.
library(broom)
library(dplyr)
x <- names(data[,-1])
out <- unlist(lapply(1, function(n) combn(x, 1, FUN=function(row)
paste0("tlv ~ ", paste0(row, collapse = "+")))))
## get the regression coefficients
tmp1 = bind_rows(lapply(out, function(frml) {
a = tidy(lm(frml, data=data))
a$frml = frml
return(a)
}))
reg_coeff2 <- tmp1
## Get regression results i.e. R2, AIC, BIC
tmp2 = bind_rows(lapply(out, function(frml) {
a = glance(lm(frml, data=data))
a$frml = frml
return(a)
}))
reg_results2 <- tmp2
reg_results2$frml <- sub("tlv ~ ", "", reg_results2$frml)
The code works very well, but I would like to implement it in order to do the following.
I have the following data frame (data)
structure(list(id = c(5309039, 5284969, 5300279, 5270289, 5259957,
5267086, 5173196), var1 = c(0, 0, 0, 0, 0, 0, 0), var2 = c(23,
24, 20, 32, 31, 37, 43), var3 = c(162, 154, 156, 154, 151.5,
171, 154), var4 = c(62.8, 52.7, 64.5, 70.9, 63, 66.2, 60.3),
tlv = c(1049, 978, 1131, 1292, 1228, 1593, 1265), form20 = c(1674.12110392683,
1517.06018080512, 1666.03606715029, 1726.99450999549, 1627.94506984781,
1754.74878787639, 1608.54623766777), form19 = c(1062.84280028848,
902.364998653641, 1054.58187260355, 1116.8664734097, 1015.66220125765,
1145.22454880977, 995.841345244203), form18 = c(1050.91941325579,
891.3634649201, 1026.84722464179, 1073.58291322486, 980.997498562542,
1147.23019335865, 971.271632531001), form17 = c(1404.10436829839,
1220.98291088203, 1419.72032143583, 1517.11065788694, 1386.31581471687,
1477.21675910098, 1347.52393410332), form16 = c(1248.12292187059,
1126.73082253566, 1229.80850901466, 1265.36558733196, 1194.92548170827,
1321.39733067342, 1187.52592495257), form15 = c(990.132,
866.003, 1011.025, 1089.681, 992.59, 1031.918, 959.407),
form14 = c(1590.6052, 1436.4718, 1582.993, 1830.3706, 1688.692,
1812.3808, 1786.5202), form13 = c(1300.81321145176, 1130.23869905075,
1292.03253463863, 1358.23586808642, 1250.66417156907, 1388.37813595599,
1277.89625553694), form12 = c(1329.6, 1104.4, 1272, 1322.8,
1195.5, 1487.4, 1195.6)), row.names = c(NA, -7L), class = c("tbl_df",
"tbl", "data.frame"))
and I need to perform linear regression between the variable tlv and all the variables whose name start with the prefix "form" , so excluding the other variables (i.e. var1, var2, var3, ...)
Consider the apply family to build needed formulas of all possible combinations then pass into lm iteratively. Except for broom functions, below demonstrates base R:
indvar_list <- lapply(1:9, function(x) combn(paste0("form", 12:20), x, simplify = FALSE))
formulas_list <- rapply(indvar_list, function(x) as.formula(paste("tlv ~", paste(x, collapse="+"))))
tmp1 <- do.call(rbind, lapply(formulas_list, function(f)
transform(tidy(lm(f, data=data)), frml = f)
))
tmp2 <- do.call(rbind, (lapply(formulas_list, function(f)
transform(glance(lm(f, data=data)), frml = f)
))
We can make it shorter with map
library(purrr)
tmp1 <- map_dfr(set_names(out, out), ~ lm(.x, data = data) %>% tidy, .id = 'fmla')
tmp2 <- map_dfr(set_names(out, out), ~ lm(.x, data = data) %>% glance, .id = 'fmla')
Or if we need only form variables, get the names of the columns that startsWith "form", pass it in reformulate to create a formula in lm, tidy the output and create the "Var" column signifying the column name (or if we need the formula itself, assign reformulate output to an object and call it later
startsWith(names(data), "form") %>%
magrittr::extract(names(data), .) %>%
map_dfr(~ lm(reformulate(.x, 'tlv'), data = data) %>%
tidy %>%
mutate(Var = .x))
Similarly change tidy to glance
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With