According to the documentation for data.frame(...)
, the ...
argument has the form:
... these arguments are of either the form value or tag = value. Component names are created based on the tag (if present) or the deparsed argument itself.
Consider a data frame with three columns: a, b, c
DF <- data.frame(a=1:10, b=letters[1:10], c=rnorm(10))
Now consider these three possibilities for creating a new data frame
newDF <- data.frame(x=DF$a)
colnames(newDF) # as expected...
# [1] "x"
newDF <- data.frame(x=DF["a"])
colnames(newDF) # Huh??
# [1] "a"
newDF <- data.frame(x=DF[["a"]])
colnames(newDF) # Why is this necessary??
# [1] "x"
Looking at the class of each RHS:
class(DF$a)
# [1] "integer"
class(DF["a"])
# [1] "data.frame"
class(DF[["a"]])
# [1] "integer"
it appears that, if the RHS is a data.frame, then tag
is overridden by the dimname of value
.
Also, consider this slightly more complicated example, prompted by this question:
library(xts)
data(sample_matrix)
xtsObject=as.xts(sample_matrix)
head(xtsObject,1)
# Open High Low Close
# 2007-01-02 50.03978 50.11778 49.95041 50.11778
newDF <- data.frame(x=xtsObject$Open) # would have expected this to work
colnames(newDF) # alas, no...
# [1] "Open"
class(xtsObject$Open)
# [1] "xts" "zoo"
So my question is: what is the rule when using data.frame(tag=value,...)
? That is, when can I expect the result to have a column named "tag"
?
tl;dr: If the object supplied to data.frame
is not named, the result will have the name of the tag.
Let's call the optional arguments to data.frame
the data. data.frame
first creates a list of the data supplied to it. The function then loops through each element of the list. If the element of the list has a name, data.frame
keeps that name. Technically, it checks to see if length(names(data[[i]])) > 0
for each element, i
, of list of the data supplied to the function. Only if that element has no names, does data.frame
use tag
as the name.
Getting back to your example, consider the names of arguments derived from DF
supplied to data.frame
:
names(DF$a)
# NULL
names(DF['a'])
# [1] "a"
names(DF[['a']])
# NULL
Notice that in the first and third case, names(...)
is NULL
. That is why data.frame(x = DF$a)
and data.frame(x = DF[['a']])
had the expected name: x
.
For the more complicated xts
object, however, notice that the resulting object from the subset operation with $
has a name:
names(xtsObject$Open)
#"Open"
names(xtsObject[, 'Open'])
#"Open"
Therefore, in either case the data frame created with either data.frame(x=xtsObject[, 'Open'])
or data.frame(x=xtsObject$Open)
will have the name Open
.
Here is the relevant code where the names are set in data.frame
. Note that x
is list(...)
where the ...
is the data.
for (i in seq_len(n)) {
xi <- if (is.character(x[[i]]) || is.list(x[[i]]))
as.data.frame(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors)
else as.data.frame(x[[i]], optional = TRUE)
nrows[i] <- .row_names_info(xi)
ncols[i] <- length(xi)
namesi <- names(xi)
if (ncols[i] > 1L) {
if (length(namesi) == 0L)
namesi <- seq_len(ncols[i])
if (no.vn[i])
vnames[[i]] <- namesi
else vnames[[i]] <- paste(vnames[[i]], namesi, sep = ".")
}
else {
if (length(namesi))
vnames[[i]] <- namesi
else if (no.vn[[i]]) {
tmpname <- deparse(object[[i]])[1L]
if (substr(tmpname, 1L, 2L) == "I(") {
ntmpn <- nchar(tmpname, "c")
if (substr(tmpname, ntmpn, ntmpn) == ")")
tmpname <- substr(tmpname, 3L, ntmpn - 1L)
}
vnames[[i]] <- tmpname
}
}
if (mirn && nrows[i] > 0L) {
rowsi <- attr(xi, "row.names")
nc <- nchar(rowsi, allowNA = FALSE)
nc <- nc[!is.na(nc)]
if (length(nc) && any(nc))
row.names <- data.row.names(row.names, rowsi,
i)
}
nrows[i] <- abs(nrows[i])
vlist[[i]] <- xi
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With