Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Default name concatenation

If I have a named vector

v <- c(a = 1, b = 2)

And I add them

s <- v[2] + v[1]

The result is a vector of length one with its element named as the first element in the arithmetic, here "b". You can remove this behavior with double brackets.

Regardless, if I then try to make a new named vector with c()

v <- c(v, sum = s)

The resulting name for the sum element is not "sum", but "sum.b".

This behavior is undesirable, since I specifically have indicated I want this element to be named sum.

If instead I add the element like this:

v["sum"] <- s

I get the desired behavior.

Why does R concatenate the name of the object and the name provided using c(), and why does this differ from adding the element using a new name in brackets? This is not to ask how to get rid of that behavior (I can do that with double brackets or unname()), but what principles are behind it, and in what other circumstances can I expect this to occur?

like image 596
tef2128 Avatar asked Sep 18 '17 13:09

tef2128


1 Answers

Attribute preservation rules for operator and combine functions

Through the transformations effected by operator functions (e.g. `+` <- function(e1, e2) {}) and the combine function (c <- function(...) {}), the designers have attempted to preserve object attributes (most commonly names).

Operator functions

For the operator functions, the rules are as follows:

  • rule 1: if length(answer) == length(e1) & length(answer) < length(e2) then use e1 attributes.

  • rule 2: if length(answer) == length(e2) & length(answer) < length(e1) then use e2 attributes.

  • rule 3: if length(answer) == length(e1) & length(answer) == length(e2) then use both e1 and e2 attributes, but with e1 attibutes taking precedence.

First, set up named and unnamed vectors. Use multi-element objects as they better demonstrate the application of the rules than the single-element objects used in the question.

print(a <- setNames(1:2, sprintf("a%s", 1:2)))
# a1 a2 
#  1  2
print(b <- setNames(1:4, sprintf("b%s", 1:4))) 
# b1 b2 b3 b4
#  1  2  3  4
print(u <- 1:2) # unnamed
# [1] 1 2
print(x <- setNames(1:4, rep("x", 4))) 
# x x x x
# 1 2 3 4

Then some examples:

#' rule 1
names(a + b[1]) # [1] "a1" "a2"
names(b + a[1]) # [1] "b1" "b2" "b3" "b4"
names(u + a[1]) # NULL

#' rule 2
names(a[1] + b) # [1] "b1" "b2" "b3" "b4"
names(b[1] + a) # [1] "a1" "a2"
names(u[1] + a) # [1] "a1" "a2"

#' rule 3
names(a + b[1:2]) # [1] "a1" "a2"
names(b[1:2] + a) # [1] "b1" "12"
names(u + a)      # [1] "a1" "a2" ## e1 is unnamed so use e2's names

Operator functions do not preserve argument names and actually seem to ignore them entirely (order alone matters), even those from the function definition (e1 and e2).

names(`+`(v1 = u, v2 = 0))      # NULL
names(`+`(e2 = a, e1 = b[1:2])) # [1] "a1" "a2" ## despite e1 being b[1:2]

Combine function

For the combine function, it is first important to note that argument names are not ignored as with with the operator functions, and that both the argument and its elements can be named. The rules are as follows:

  • rule 1: if argument is unnamed and its elements are unnamed, no names used.

  • rule 2: if argument is unnamed but its elements are named, use element names.

  • rule 3: if argument is named and its elements are unnamed, use argument name for arguments of length == 1, and argument name with sequential numeric suffixes for arguments of length > 1.

  • rule 4: if argument is named and its elements are named, use names joined by dots.

First a function to show argument naming:

c... <- function(...) {match.call(expand.dots=FALSE)$...}

Some examples:

# rules 1 and 2
names(c...(u, a)) # NULL, all arguments unnamed
names(c(u, a))    # c("", "", "a1", "a2")

# rules 3 and 4
names(c...(v1 = u[1], v2 = u, v3 = a)) # c("v1", "v2", "v3"), all arguments named
names(c(v1 = u[1], v2 = u, v3 = a))    # c("v1", "v21", "v22", "v3.a1", "v3.a2")

# all rules
names(c...(u, v1 = u, a, v2 = a)) # c("", "v1", "", "v2") ## some arguments named
names(c(u, v1 = u, a, v2 = a))    # c("", "", "v11", "v12", "a1", "a2", "v2.a1", "v2.a2")

It should be noted that the rules are designed to preserve attributes as much as possible, but the intention is not to create names as unique identifiers.

 # ambiguities (rules 2 and 4)
 names(c(x, v1 = x)) # c("x", "x", "x", "x", "v1.x", "v1.x", "v1.x", "v1.x")

When operator and combine functions are used together, the rules for each are applied in the order of evaluation. So in c(a) + c(b), which is equivalent to `+`(c(a), c(b)), the combine function rules are applied first. Whereas, in (a + b), which is equivalent to c(`+`(a, b)), the operator function rules are applied first.

# c() first, so argument names used
names(c(v1=a) + c(v2=b))    # c("v2.b1", "v2.b2", "v2.b3", "v2.b4")
names(`+`(c(v1=a),c(v2=b))) # c("v2.b1", "v2.b2", "v2.b3", "v2.b4") 

# `+`() first, so argument names ignored
names(c((v1=a) + (v2=b))) # c("b1", "b2", "b3", "b4")
names(c(`+`(v1=a, v2=b))) # c("b1", "b2", "b3", "b4")

Question and answer

The questions were, (1) "why does R concatenate the name of the object and the name provided using c()", and (2) "why does this differ from adding the element using a new name in brackets?"

  1. The principle for the operator and combine functions is to preserve object attributes (including both argument and element names) in the most meaningful (and predictable) way by applying the rules given above. It is not possible for all attributes from the all arguments and objects to be preserved intact, so (a) precedence, (b) sequential numerical suffixes, and (c) concatenation are applied to preserve as much of the most appropriate information as possible.

  2. Direct assignment of a named element assigns both the value and attribute, and in this context there is no need to preserve attributes from two objects, or from an argument and an object, so none of the above considerations apply.

like image 131
IanRiley Avatar answered Oct 17 '22 17:10

IanRiley