Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Inline expansion of variables in R

Tags:

r

I'm confused with when a value is treated as a variable, and when as a string in R. In Ruby and Python, I'm used to a string always having to be quoted, and an unquoted string is always treated as a variable. Ie.

a["hello"] => a["hello"]
b = "hi"
a[b] => a["hi"]

But in R, this is not the case, for example

a$b < c(1,2,3)

b here is the value/name of the column, not the variable b.

c <- "b"
a$c => column not found (it's looking for column c, not b, which is the value of the variable c)

(I know that in this specific case I can use a[c], but there are many other cases. Such as ggplot(a, aes(x=c)) - I want to plot the column that is the value of c, not with the name c)...

In other StackOverflow questions, I've seen things like quote, substitute etc mentioned.

My question is: Is there a general way of "expanding" a variable and making sure the value of the variable is used, instead of the name of the variable? Or is that just not how things are done in R?

like image 963
Stian Håklev Avatar asked Jan 24 '26 01:01

Stian Håklev


2 Answers

In your example, a$b is syntatic sugar for a[["b"]]. That's a special feature of the $ symbol when used with lists. The second form does what you expect - a[[b]] will return the element of a whose name == the value of the variable b, rather than the element whose name is "b".

Data frames are similar. For a data frame a, the $ operator refers to the column names. So a$b is the same as a[ , "b"]. In this case, to refer to the column of a indicated by the value of b, use a[, b].

like image 119
Tyler Avatar answered Jan 25 '26 18:01

Tyler


The reason that what you posted with respect to the $ operator doesn't work is quite subtle and is in general quite different to most other situations in R where you can just use a function like get which was designed for that purpose. However, calling a$b is equivalent to calling

`$`(a , b)

This reminds us, that in R, everything is an object. $ is a function and it takes two arguments. If we check the source code we can see that calling a$c and expecting R to evaluate c to "b" will never work, because in the source code it states:

/* The $ subset operator.  
   We need to be sure to only evaluate the first argument.  
   The second will be a symbol that needs to be matched, not evaluated.  
*/

It achieves this using the following:

if(isSymbol(nlist) )
SET_STRING_ELT(input, 0, PRINTNAME(nlist));
else if(isString(nlist) )
SET_STRING_ELT(input, 0, STRING_ELT(nlist, 0));
else {
errorcall(call,_("invalid subscript type '%s'"),
      type2char(TYPEOF(nlist)));
}

nlist is the argument you passed do_subset_3 (the name of the C function $ maps to), in this case c. It found that c was a symbol, so it replaces it with a string but does not evaluate it. If it was a string then it is passed as a string.

like image 32
Simon O'Hanlon Avatar answered Jan 25 '26 18:01

Simon O'Hanlon