Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unexpected behavior with data.frame(tag=value, ...)

Tags:

dataframe

r

According to the documentation for data.frame(...), the ... argument has the form:

... these arguments are of either the form value or tag = value. 
    Component names are created based on the tag (if present) or 
    the deparsed argument itself.

Consider a data frame with three columns: a, b, c

DF <- data.frame(a=1:10, b=letters[1:10], c=rnorm(10))

Now consider these three possibilities for creating a new data frame

newDF <- data.frame(x=DF$a)
colnames(newDF)        # as expected...
# [1] "x"
newDF <- data.frame(x=DF["a"])
colnames(newDF)        # Huh??
# [1] "a"
newDF <- data.frame(x=DF[["a"]])
colnames(newDF)        # Why is this necessary??
# [1] "x"

Looking at the class of each RHS:

class(DF$a)
# [1] "integer"
class(DF["a"])
# [1] "data.frame"
class(DF[["a"]])
# [1] "integer"

it appears that, if the RHS is a data.frame, then tag is overridden by the dimname of value.

Also, consider this slightly more complicated example, prompted by this question:

library(xts)
data(sample_matrix)
xtsObject=as.xts(sample_matrix)
head(xtsObject,1)
#                Open     High      Low    Close
# 2007-01-02 50.03978 50.11778 49.95041 50.11778
newDF <- data.frame(x=xtsObject$Open)    # would have expected this to work
colnames(newDF)                          # alas, no...
# [1] "Open"
class(xtsObject$Open)
# [1] "xts" "zoo"

So my question is: what is the rule when using data.frame(tag=value,...)? That is, when can I expect the result to have a column named "tag"?

like image 319
jlhoward Avatar asked Nov 02 '22 06:11

jlhoward


1 Answers

tl;dr: If the object supplied to data.frame is not named, the result will have the name of the tag.

Let's call the optional arguments to data.frame the data. data.frame first creates a list of the data supplied to it. The function then loops through each element of the list. If the element of the list has a name, data.frame keeps that name. Technically, it checks to see if length(names(data[[i]])) > 0 for each element, i, of list of the data supplied to the function. Only if that element has no names, does data.frame use tag as the name.

Getting back to your example, consider the names of arguments derived from DF supplied to data.frame:

names(DF$a)
# NULL
names(DF['a'])
# [1] "a"
names(DF[['a']])
# NULL

Notice that in the first and third case, names(...) is NULL. That is why data.frame(x = DF$a) and data.frame(x = DF[['a']]) had the expected name: x.

For the more complicated xts object, however, notice that the resulting object from the subset operation with $ has a name:

names(xtsObject$Open)
#"Open"
names(xtsObject[, 'Open'])
#"Open"

Therefore, in either case the data frame created with either data.frame(x=xtsObject[, 'Open']) or data.frame(x=xtsObject$Open) will have the name Open.

Here is the relevant code where the names are set in data.frame. Note that x is list(...) where the ... is the data.

for (i in seq_len(n)) {
  xi <- if (is.character(x[[i]]) || is.list(x[[i]])) 
    as.data.frame(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors)
  else as.data.frame(x[[i]], optional = TRUE)
  nrows[i] <- .row_names_info(xi)
  ncols[i] <- length(xi)
  namesi <- names(xi)
  if (ncols[i] > 1L) {
    if (length(namesi) == 0L) 
      namesi <- seq_len(ncols[i])
    if (no.vn[i]) 
      vnames[[i]] <- namesi
    else vnames[[i]] <- paste(vnames[[i]], namesi, sep = ".")
  }
  else {
    if (length(namesi)) 
      vnames[[i]] <- namesi
    else if (no.vn[[i]]) {
      tmpname <- deparse(object[[i]])[1L]
      if (substr(tmpname, 1L, 2L) == "I(") {
        ntmpn <- nchar(tmpname, "c")
        if (substr(tmpname, ntmpn, ntmpn) == ")") 
          tmpname <- substr(tmpname, 3L, ntmpn - 1L)
        }
      vnames[[i]] <- tmpname
    }
  }
  if (mirn && nrows[i] > 0L) {
    rowsi <- attr(xi, "row.names")
    nc <- nchar(rowsi, allowNA = FALSE)
    nc <- nc[!is.na(nc)]
    if (length(nc) && any(nc)) 
      row.names <- data.row.names(row.names, rowsi, 
                                  i)
  }
  nrows[i] <- abs(nrows[i])
  vlist[[i]] <- xi
}
like image 85
Christopher Louden Avatar answered Nov 10 '22 05:11

Christopher Louden