Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is "{" class in R?

Tags:

r

expression

Here is the code:

mf = function(..., expr) {
    expr = substitute(expr)
    print(class(expr))
    print(str(expr))
    expr
}
mf(a = 1, b = 2, expr = {matrix(NA, 4, 4)})

Output:

[1] "{"
length 2 {  matrix(NA, 4, 4) }
 - attr(*, "srcref")=List of 2
  ..$ :Class 'srcref'  atomic [1:8] 1 25 1 25 25 25 1 1
  .. .. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x7fbcdbce3860> 
  ..$ :Class 'srcref'  atomic [1:8] 1 26 1 41 26 41 1 1
  .. .. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x7fbcdbce3860> 
 - attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x7fbcdbce3860> 
 - attr(*, "wholeSrcref")=Class 'srcref'  atomic [1:8] 1 0 1 42 0 42 1 1
  .. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x7fbcdbce3860> 
NULL
{
matrix(NA, 4, 4)
}

Apparently the result of substitute(expr) produces something of the class "{". What is this class exactly? Why is {matrix(NA, 4, 4)} of length 2? What do these strange attrs mean?

like image 460
qed Avatar asked May 31 '15 19:05

qed


People also ask

What is class and mode in R?

mode (almost the same thing) is basically the most complex datatype that an R object can hold as one of its values; whereas. class is an object's object-oriented classification according to the R class hierarchy.

What is the use of class () function in R?

The class function in R helps us to understand the type of object, for example the output of class for a data frame is integer and the typeof of the same object is list because data frames are stored as list in the memory but they are represented as a data frame.

What is the class of an object in R?

An R ``object'' is a data object which has a class attribute. A class attribute is a vector of character strings giving the names of the classes which the object ``inherits'' from.

What is a data class in R?

R's basic data types are character, numeric, integer, complex, and logical. R's basic data structures include the vector, list, matrix, data frame, and factors.


2 Answers

The { is the class for a block of code. Just looking at the classes, note the difference between these

mf(a = 1, b = 2, expr = {matrix(NA, 4, 4)})
# [1] "{"
mf(a = 1, b = 2, expr = matrix(NA, 4, 4))
# [1] "call"

A class of { can hold multiple statements. The length() indicates how many statements are in the block (including the start of the block). For example

length(quote({matrix(NA, 4, 4)}))
# [1] 2
length(quote({matrix(NA, 4, 4); matrix(NA,3,3)}))
# [1] 3
length(quote({}))
# [1] 1

The attributes "srcref" and "srcfile" are how R tracks where functions are defined for trying to give informative error messages. You can see the ?srcfile help page for more information about that.

like image 64
MrFlick Avatar answered Nov 04 '22 12:11

MrFlick


What you're seeing is a reflection of the way R exposes its internal language structure through its own data structures.

The substitute() function returns the parse tree of an R expression. The parse tree is a tree of language elements. These can include literal values, symbols (basically variable names), function calls, and braced blocks. Here's a demonstration of all the R language elements as returned by substitute(), showing their types in all of R's type classification schemes:

tmc <- function(x) c(typeof(x),mode(x),class(x));
tmc(substitute(TRUE));
## [1] "logical" "logical" "logical"
tmc(substitute(4e5L));
## [1] "integer" "numeric" "integer"
tmc(substitute(4e5));
## [1] "double"  "numeric" "numeric"
tmc(substitute(4e5i));
## [1] "complex" "complex" "complex"
tmc(substitute('a'));
## [1] "character" "character" "character"
tmc(substitute(somevar));
## [1] "symbol" "name"   "name"
tmc(substitute(T));
## [1] "symbol" "name"   "name"
tmc(substitute(sum(somevar)));
## [1] "language" "call"     "call"
tmc(substitute(somevec[1]));
## [1] "language" "call"     "call"
tmc(substitute(somelist[[1]]));
## [1] "language" "call"     "call"
tmc(substitute(somelist$x));
## [1] "language" "call"     "call"
tmc(substitute({blah}));
## [1] "language" "call"     "{"

Notes:

  • Note how all three type classification schemes are very similar, but subtly different. This can be a source of confusion. typeof() gives the storage type of the object, sometimes called the "internal" type (to be honest, it probably shouldn't be called "internal" because it is frequently exposed very directly to the user at the R level, but it is often described that way; I would call it the "fundamental" or "underlying" type), mode() gives a similar classification scheme that everyone should probably ignore, and class() gives the implicit (if there's no class attribute) or explicit (if there is) class of the object, which is used for S3 method lookup (and, it should be said, is sometimes examined directly by R code, independent of the S3 lookup process).
  • Note how TRUE is a logical literal, but T is a symbol, just like any other variable name, and just happens to be assigned to TRUE by default (and ditto for F and FALSE). This is why sometimes people recommend against using T and F in favor of using TRUE and FALSE, because T and F can be reassigned (but personally I prefer to use T and F for the concision; no one should ever reassign those!).
  • The astute reader will notice that in my demonstration of literals, I've omitted the raw type. This is because there's no such thing as a raw literal in R. In fact, there are very few ways to get a hold of raw vectors in R; raw(), as.raw(), charToRaw(), and rawConnectionValue() are the only ways that I'm aware of, and if I used those functions in a substitute() call, they would be returned as "call" objects, just like in the sum(somevar) example, not literal raw values. The same can be said for the list type; there's no such thing as a list literal (although there are many ways to acquire a list via function calls). Plain raw vectors return 'raw' for all three type classifications, and plain lists return 'list' for all three type classifications.

Now, when you have a parse tree that is more complicated than a simple literal value or symbol (meaning it must be a function call or braced expression), you can generally examine the contents of that parse tree by coercing to list. This is how R exposes its internal language structure through its own data structures.

Diving into your example:

pt <- as.list(substitute({matrix(NA,4,4)}));
pt;
## [[1]]
## `{`
##
## [[2]]
## matrix(NA, 4, 4)

This makes it clear why length() returns 2: that's the length of the list that represents the parse tree. In general, the bracing of the expression is translated into the first list component, and the remaining list components are built from the semicolon-separated statements within the braces:

as.list(substitute({}));
## [[1]]
## `{`
##
as.list(substitute({a}));
## [[1]]
## `{`
##
## [[2]]
## a
##
as.list(substitute({a;b}));
## [[1]]
## `{`
##
## [[2]]
## a
##
## [[3]]
## b
##
as.list(substitute({a;b;c}));
## [[1]]
## `{`
##
## [[2]]
## a
##
## [[3]]
## b
##
## [[4]]
## c

Note that this is identical to how function calls work, except with the difference that, for function calls, the list components are formed from the comma-separated arguments to the function call:

as.list(substitute(sum()));
## [[1]]
## sum
##
as.list(substitute(sum(1)));
## [[1]]
## sum
##
## [[2]]
## [1] 1
##
as.list(substitute(sum(1,3)));
## [[1]]
## sum
##
## [[2]]
## [1] 1
##
## [[3]]
## [1] 3
##
as.list(substitute(sum(1,3,5)));
## [[1]]
## sum
##
## [[2]]
## [1] 1
##
## [[3]]
## [1] 3
##
## [[4]]
## [1] 5

From the above it becomes clear that the first list component is actually a symbol representing the name of a function, for both braced expressions and function calls. In other words, the open brace is a function call, one which simply returns its final argument. Just as square brackets are normal function calls with a convenient syntax built on top of them, the open brace is a normal function call with a convenient syntax built on top of it:

a <- 4:6;
a[2];
## [1] 5
`[`(a,2);
## [1] 5
{1;2};
## [1] 2
`{`(1,2);
## [1] 2

Returning to your example, we can fully explore the parse tree by traversing the list structure that represents the parse tree. I just wrote a nice little recursive function that can do this very easily:

unwrap <- function(x) if (typeof(x) == 'language') lapply(as.list(x),unwrap) else x;
unwrap(substitute(3));
## [1] 3
unwrap(substitute(a));
## a
unwrap(substitute(a+3));
## [[1]]
## `+`
##
## [[2]]
## a
##
## [[3]]
## [1] 3
##
unwrap(substitute({matrix(NA,4,4)}));
## [[1]]
## `{`
##
## [[2]]
## [[2]][[1]]
## matrix
##
## [[2]][[2]]
## [1] NA
##
## [[2]][[3]]
## [1] 4
##
## [[2]][[4]]
## [1] 4

As you can see, the braced expression turns into a normal function call of the function `{`(), taking one argument, which is the single statement you coded into it. That statement consists of a single function call to matrix(), taking three arguments, each of which being a literal value: NA, 4, and 4. And that's the entire parse tree.

So now we can understand the meaning of the "{" class on a deep level: it represents an element of a parse tree that is a function call to the `{`() function. It happens to be classed differently from other function calls ("{" instead of "call"), but as far as I can tell, that has no significance anywhere. Also observe that the typeof() and mode() are identical ("language" and "call", respectively) between all parse tree elements representing function calls, for both `{`() and others alike.

like image 41
bgoldst Avatar answered Nov 04 '22 12:11

bgoldst