Being aware of the danger of using dynamic variable names, I am trying to loop over varios regression models where different variables specifications are choosen. Usually <code>!!rlang::sym()</code> solves this kind of problem for me just fine, but it somehow fails in regressions. A minimal example would be the following: <pre class="prettyprint"><code>y= runif(1000) x1 = runif(1000) x2 = runif(1000) df2= data.frame(y,x1,x2) summary(lm(y ~ x1+x2, data=df2)) ## works var = "x1" summary(lm(y ~ !!rlang::sym(var)) +x2, data=df2) # gives an error </code></pre> My understanding was that <code>!!rlang::sym(var))</code> takes the values of <code>var</code> (namely x1) and puts that in the code in a way that R thinks this is a variable (not a char). BUt I seem to be wrong. Can anyone enlighten me?

Personally, I like to do this with some computing on the language. For me, a combination of <code>bquote</code> with <code>eval</code> is easiest (to remember). <pre class="prettyprint"><code>var <- as.symbol(var) eval(bquote(summary(lm(y ~ .(var) + x2, data = df2)))) #Call: #lm(formula = y ~ x1 + x2, data = df2) # #Residuals: # Min 1Q Median 3Q Max #-0.49298 -0.26248 -0.00046 0.24111 0.51988 # #Coefficients: # Estimate Std. Error t value Pr(>|t|) #(Intercept) 0.50244 0.02480 20.258 <2e-16 *** #x1 -0.01468 0.03161 -0.464 0.643 #x2 -0.01635 0.03227 -0.507 0.612 #--- #Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # #Residual standard error: 0.2878 on 997 degrees of freedom #Multiple R-squared: 0.0004708, Adjusted R-squared: -0.001534 #F-statistic: 0.2348 on 2 and 997 DF, p-value: 0.7908 </code></pre> I find this superior to any approach that doesn't show the same call as <code>summary(lm(y ~ x1+x2, data=df2))</code>.

Dynamic variable names in R regressions

Tags:

r

Being aware of the danger of using dynamic variable names, I am trying to loop over varios regression models where different variables specifications are choosen. Usually !!rlang::sym() solves this kind of problem for me just fine, but it somehow fails in regressions. A minimal example would be the following:

y= runif(1000) 
x1 = runif(1000) 
x2 = runif(1000) 

df2= data.frame(y,x1,x2)
summary(lm(y ~ x1+x2, data=df2)) ## works

var = "x1"
summary(lm(y ~ !!rlang::sym(var)) +x2, data=df2) # gives an error

My understanding was that !!rlang::sym(var)) takes the values of var (namely x1) and puts that in the code in a way that R thinks this is a variable (not a char). BUt I seem to be wrong. Can anyone enlighten me?

811

asked Dec 05 '18 15:12

safex

2 Answers

Personally, I like to do this with some computing on the language. For me, a combination of bquote with eval is easiest (to remember).

var <- as.symbol(var)
eval(bquote(summary(lm(y ~ .(var) + x2, data = df2))))
#Call:
#lm(formula = y ~ x1 + x2, data = df2)
#
#Residuals:
#     Min       1Q   Median       3Q      Max 
#-0.49298 -0.26248 -0.00046  0.24111  0.51988 
#
#Coefficients:
#            Estimate Std. Error t value Pr(>|t|)    
#(Intercept)  0.50244    0.02480  20.258   <2e-16 ***
#x1          -0.01468    0.03161  -0.464    0.643    
#x2          -0.01635    0.03227  -0.507    0.612    
#---
#Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
#Residual standard error: 0.2878 on 997 degrees of freedom
#Multiple R-squared:  0.0004708,    Adjusted R-squared:  -0.001534 
#F-statistic: 0.2348 on 2 and 997 DF,  p-value: 0.7908

I find this superior to any approach that doesn't show the same call as summary(lm(y ~ x1+x2, data=df2)).

answered Sep 30 '22 11:09

Roland

The bang-bang operator !! only works with "tidy" functions. It's not a part of the core R language. A base R function like lm() has no idea how to expand such operators. Instead, you need to wrap those in functions that can do the expansion. rlang::expr is one such example

rlang::expr(summary(lm(y ~ !!rlang::sym(var) + x2, data=df2)))
# summary(lm(y ~ x1 + x2, data = df2))

Then you need to use rlang::eval_tidy to actually evaluate it

rlang::eval_tidy(rlang::expr(summary(lm(y ~ !!rlang::sym(var) + x2, data=df2))))

# Call:
# lm(formula = y ~ x1 + x2, data = df2)
# 
# Residuals:
#     Min       1Q   Median       3Q      Max 
# -0.49178 -0.25482  0.00027  0.24566  0.50730 
# 
# Coefficients:
#               Estimate Std. Error t value Pr(>|t|)    
# (Intercept)  0.4953683  0.0242949  20.390   <2e-16 ***
# x1          -0.0006298  0.0314389  -0.020    0.984    
# x2          -0.0052848  0.0318073  -0.166    0.868    
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 0.2882 on 997 degrees of freedom
# Multiple R-squared:  2.796e-05,   Adjusted R-squared:  -0.001978 
# F-statistic: 0.01394 on 2 and 997 DF,  p-value: 0.9862

You can see this version preserves the expanded formula in the model object.

answered Sep 30 '22 12:09

MrFlick

Related questions
                            
                                Missing horizontal scroll bar in R Markdown HTML code chunks and output
                            
                                R Error: could not find function "select"
                            
                                Replace NA with 0, only in numeric columns in data.table
                            
                                Passing a column name to R tidyr spread
                            
                                Counting occurrences without modifying the original order
                            
                                stringr equivalent to grep
                            
                                Change size of hover text in Plotly
                            
                                filter duplicates from a data frame in r [duplicate]
                            
                                Removing latitude and longitude labels in ggplot
                            
                                as.Date produces unexpected result in a sequence of week-based dates
                            
                                Spread with duplicate identifiers (using tidyverse and %>%) [duplicate]
                            
                                `purrr::map` to any type
                            
                                Remove rows with the same value across all columns
                            
                                Remove specific last character from string
                            
                                Error with H2O in R - can't connect to local host
                            
                                How to Transpose (t) in the Tidyverse Using Tidyr
                            
                                R: Remove duplicates from a dataframe based on categories in a column
                            
                                Show content for menuItem when menuSubItems exist in Shiny Dashboard
                            
                                Reducing spacing between lines when using atop
                            
                                How to include NA data in a table

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With