R: specifying variable name in function parameter for a function of general (universal) use

Question

Here is my small function and data. Please note that I want to design a function not personal use for general use.

dataf <- data.frame (A= 1:10, B= 21:30, C= 51:60, D = 71:80)

myfun <- function (dataframe, varA, varB) {
              daf2 <- data.frame (A = dataframe$A*dataframe$B, 
              B= dataframe$C*dataframe$D)
              anv1 <- lm(varA ~ varB, daf2)
              print(anova(anv1)) 
             }             

myfun (dataframe = dataf, varA = A, varB = B)

Error in eval(expr, envir, enclos) : object 'A' not found

It works with when I specify data$variable name, but I do not want to make such specification so that it requires the user to write both data and variable name in the function.

 myfun (dataframe = dataf, varA = dataf$A, varB = dataf$B)
Analysis of Variance Table

Response: varA
          Df Sum Sq Mean Sq    F value    Pr(>F)    
varB       1   82.5    82.5 1.3568e+33 < 2.2e-16 ***
Residuals  8    0.0     0.0                         
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
Warning message:
In anova.lm(anv1) :
  ANOVA F-tests on an essentially perfect fit are unreliable

what is best practice in this situation? can I put attach the dataframe inside the function? what could be disadvantages or potential conflicts / dangers on doing so? See the masked statement in output. I believe once it is attached will remain attached reminder of session right? THe function provided here is just example, I need more downstream analysis where names of variables from different dataframe can be / should be identical. I am expecting a programmer solution on this.

myfun <- function (dataframe, varA, varB) {
              attach(dataframe)
                 daf2 <- data.frame (A = A*B, B= C*D)
              anv1 <- lm(varA ~ varB, daf2)
              return(anova(anv1))
             }             

myfun (dataframe = dataf, varA = A, varB = B)

The following object(s) are masked from 'dataframe (position 3)':

    A, B, C, D
Analysis of Variance Table

Response: varA
          Df Sum Sq Mean Sq    F value    Pr(>F)    
varB       1   82.5    82.5 1.3568e+33 < 2.2e-16 ***
Residuals  8    0.0     0.0                         
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
Warning message:
In anova.lm(anv1) :
  ANOVA F-tests on an essentially perfect fit are unreliable

Nick Sabbe · Accepted Answer

Let's investigate (see the comments I added) you original function and call, assuming you mean to pass the names of you columns of interest to the function:

myfun <- function (dataframe, varA, varB) {
              #on this next line, you use A and B. But this should be what is
              #passed in as varA and varB, no?
              daf2 <- data.frame (A = dataframe$A*dataframe$B, B=dataframe$C*dataframe$D)
              #so, as a correction, we need:
              colnames(daf2)<-c(varA, varB)
              #the first argument to lm is a formula. If you use it like this,
              #it refers to columns with _names_ varA and varB, not as names
              #the _contents_ of varA and varB!!
              anv1 <- lm(varA ~ varB, daf2)
              #so, what we really want, is to build a formula with the contents
              #of varA and varB: we have to this by building up a character string:
              frm<-paste(varA, varB, sep="~")
              anv1 <- lm(formula(frm), daf2)
              print(anova(anv1)) 
             }             
#here, you pass A and B, because you are used to being able to do that in a formula
#(like in lm). But in a formula, there is a great deal of work done to make that
#happen, that doesn't work for most of the rest of R, so you need to pass the names
#again as character strings:
myfun (dataframe = dataf, varA = A, varB = B)
#becomes:
myfun (dataframe = dataf, varA = "A", varB = "B")

Note: in the above, I left the original code in place, so you may have to remove some of that to avoid the errors you were originally getting. The essence of your problems is that you should always pass column names as characters, and use them as such. This is one of the places where the syntactic sugar of formulas in R gets people into bad habits and misunderstandings...

Now, as for an alternative: the only place the variable names are actually used, are in the formula. As such, you can simplify matters further if you don't mind some slight cosmetic differences in the results that you can clean up later: there is no need for you to pass along the column names!!

myfun <- function (dataframe) {
              daf2 <- data.frame (A = dataframe$A*dataframe$B, B=dataframe$C*dataframe$D)
              #now we know that columns A and B simply exist in data.frame daf2!!
              anv1 <- lm(A ~ B, daf2)
              print(anova(anv1))
             }

As a final piece of advice: I would refrain from calling print on your last statement: if you don't, but use this method directly from the R command line, it will perform the print for you anyway. As an added advantage, you can perform further work with the object returned from your method.

Cleaned Function with trial:

dataf <- data.frame (A= 1:10, B= 21:30, C= 51:60, D = 71:80)
myfun <- function (dataframe, varA, varB) {
               frm<-paste(varA, varB, sep="~")
               anv1 <- lm(formula(frm), dataframe)
               anova(anv1)
             }
 myfun (dataframe = dataf, varA = "A", varB = "B")
  myfun (dataframe = dataf, varA = "A", varB = "D")
    myfun (dataframe = dataf, varA = "B", varB = "C")

Carl Witthoft · Answer

You could always go the (horrors) parse() route:

Rgames: foo<- data.frame(one=1:5,two=6:10)
Rgames: bar <- function(y) eval(parse(text=paste('foo$',y,sep='')))

Which is to say, inside your function, grab the arguments to the function and build up the internal data frame or pairs of vectors of data you want using the eval(parse(...)) setup.

R: specifying variable name in function parameter for a function of general (universal) use

Tags:

variables

dataframe

r

functional-programming

jon

2 Answers

Nick Sabbe

Carl Witthoft

Recent Activity

Donate For Us

R: specifying variable name in function parameter for a function of general (universal) use

Tags:

variables

dataframe

r

functional-programming

jon

2 Answers

Nick Sabbe

Carl Witthoft

Related questions

Recent Activity

Donate For Us