Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

performing all possible linear regressions between 1 variable and a list of variables

Tags:

r

I am using the following code ( which was developed in a previous post) for the following task: performing all possible linear regressions between the first variable and the other variables and saving the results in a new data frame.

library(broom)
library(dplyr)
x <- names(data[,-1])
out <- unlist(lapply(1, function(n) combn(x, 1, FUN=function(row) 
          paste0("tlv ~ ", paste0(row, collapse = "+")))))
## get the regression coefficients
tmp1 = bind_rows(lapply(out, function(frml) {
      a = tidy(lm(frml, data=data))
      a$frml = frml
      return(a)
    }))
reg_coeff2 <- tmp1
 ## Get regression results i.e. R2, AIC, BIC
 tmp2 = bind_rows(lapply(out, function(frml) {
      a = glance(lm(frml, data=data))
      a$frml = frml
      return(a)
    }))
 reg_results2 <- tmp2
 reg_results2$frml <- sub("tlv ~ ", "", reg_results2$frml)

The code works very well, but I would like to implement it in order to do the following.

I have the following data frame (data)

structure(list(id = c(5309039, 5284969, 5300279, 5270289, 5259957, 
5267086, 5173196), var1 = c(0, 0, 0, 0, 0, 0, 0), var2 = c(23, 
24, 20, 32, 31, 37, 43), var3 = c(162, 154, 156, 154, 151.5, 
171, 154), var4 = c(62.8, 52.7, 64.5, 70.9, 63, 66.2, 60.3), 
    tlv = c(1049, 978, 1131, 1292, 1228, 1593, 1265), form20 = c(1674.12110392683, 
    1517.06018080512, 1666.03606715029, 1726.99450999549, 1627.94506984781, 
    1754.74878787639, 1608.54623766777), form19 = c(1062.84280028848, 
    902.364998653641, 1054.58187260355, 1116.8664734097, 1015.66220125765, 
    1145.22454880977, 995.841345244203), form18 = c(1050.91941325579, 
    891.3634649201, 1026.84722464179, 1073.58291322486, 980.997498562542, 
    1147.23019335865, 971.271632531001), form17 = c(1404.10436829839, 
    1220.98291088203, 1419.72032143583, 1517.11065788694, 1386.31581471687, 
    1477.21675910098, 1347.52393410332), form16 = c(1248.12292187059, 
    1126.73082253566, 1229.80850901466, 1265.36558733196, 1194.92548170827, 
    1321.39733067342, 1187.52592495257), form15 = c(990.132, 
    866.003, 1011.025, 1089.681, 992.59, 1031.918, 959.407), 
    form14 = c(1590.6052, 1436.4718, 1582.993, 1830.3706, 1688.692, 
    1812.3808, 1786.5202), form13 = c(1300.81321145176, 1130.23869905075, 
    1292.03253463863, 1358.23586808642, 1250.66417156907, 1388.37813595599, 
    1277.89625553694), form12 = c(1329.6, 1104.4, 1272, 1322.8, 
    1195.5, 1487.4, 1195.6)), row.names = c(NA, -7L), class = c("tbl_df", 
"tbl", "data.frame"))

and I need to perform linear regression between the variable tlv and all the variables whose name start with the prefix "form" , so excluding the other variables (i.e. var1, var2, var3, ...)

like image 690
Mariano C Giglio Avatar asked Dec 04 '25 00:12

Mariano C Giglio


2 Answers

Consider the apply family to build needed formulas of all possible combinations then pass into lm iteratively. Except for broom functions, below demonstrates base R:

indvar_list <- lapply(1:9, function(x) combn(paste0("form", 12:20), x, simplify = FALSE)) 

formulas_list <- rapply(indvar_list, function(x) as.formula(paste("tlv ~", paste(x, collapse="+")))) 

tmp1 <- do.call(rbind, lapply(formulas_list, function(f)
   transform(tidy(lm(f, data=data)), frml = f)
))

tmp2 <- do.call(rbind, (lapply(formulas_list, function(f)
   transform(glance(lm(f, data=data)), frml = f)
))
like image 72
Parfait Avatar answered Dec 06 '25 14:12

Parfait


We can make it shorter with map

library(purrr)
tmp1 <- map_dfr(set_names(out, out),  ~ lm(.x, data = data) %>% tidy, .id = 'fmla')
tmp2 <- map_dfr(set_names(out, out),  ~ lm(.x, data = data) %>% glance, .id = 'fmla')

Or if we need only form variables, get the names of the columns that startsWith "form", pass it in reformulate to create a formula in lm, tidy the output and create the "Var" column signifying the column name (or if we need the formula itself, assign reformulate output to an object and call it later

startsWith(names(data), "form") %>%
    magrittr::extract(names(data), .) %>%
    map_dfr(~  lm(reformulate(.x, 'tlv'), data = data) %>% 
                  tidy %>%
                  mutate(Var = .x))

Similarly change tidy to glance

like image 44
akrun Avatar answered Dec 06 '25 16:12

akrun



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!