The "apply" documentation mentions that "Where 'X' has named dimnames, it can be a character vector selecting dimension names." I would like to use apply on a data.frame for only particular columns. Can I use the dimnames feature to do this?
I realize I can subset() X to only include the columns of interest, but I want to understand "named dimnames" better.
Below is some sample code:
> x <- data.frame(cbind(1,1:10))
> apply(x,2,sum)
X1 X2
10 55
> apply(x,c('X2'),sum)
Error in apply(x, c("X2"), sum) : 'X' must have named dimnames
> dimnames(x)
[[1]]
[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10"
[[2]]
[1] "X1" "X2"
> names(x)
[1] "X1" "X2"
> names(dimnames(x))
NULL
If I understand you correctly, you would like to use apply only on certain columns. This is not what named dimnames would accomplish. The apply function on a matrix or data.frame always applies to all the rows or all the columns. The named dimnames allows you to choose to use rows or columns by name instead of the "normal" 1
and 2
:
m <- matrix(1:12,4, dimnames=list(foo=letters[1:4], bar=LETTERS[1:3]))
apply(m, "bar", sum) # Use "bar" instead of 2 to refer to the columns
However if you have the column names you'd like to apply to, you could do it by first selecting only those columns:
n <- c("A","C")
apply(m[,n], 2, sum)
# A C
#10 42
Named dimnames is a side-effect of that dimnames are stored as a list in the "dimnames" attribute in a matrix
or array
. Each component of the list corresponds to one dimension and can be named. This is probably more useful for multidimensional arrays...
For a data.frame
, there is no "dimnames" attribute. A data.frame
is essentially a list, so the list's "names" attributes corresponds to the column names, and an extra "row.names" attribute corresponds to the row names. Because of this, there is no place to store the names of the dimnames (they could have an extra attribute for that of course, but they didn't). When you call the dimnames
function on a data.frame, it simply creates a list from the "row.names" and "names" attributes.
The issue is that you can't manipulate the dimnames of x
directly for some reason, and x
will be coerced to a matrix which isn't preserving named dimnames.
A solution is to coerce to a matrix first, then name the dimnames and then use apply()
> X <- as.matrix(x)
> str(X)
num [1:10, 1:2] 1 1 1 1 1 1 1 1 1 1 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:10] "1" "2" "3" "4" ...
..$ : chr [1:2] "X1" "X2"
> dimnames(X) <- list(C1 = dimnames(x)[[1]], C2 = dimnames(x)[[2]])
> str(X)
num [1:10, 1:2] 1 1 1 1 1 1 1 1 1 1 ...
- attr(*, "dimnames")=List of 2
..$ C1: chr [1:10] "1" "2" "3" "4" ...
..$ C2: chr [1:2] "X1" "X2"
> apply(X, "C1", mean)
1 2 3 4 5 6 7 8 9 10
1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5
> rowMeans(X)
1 2 3 4 5 6 7 8 9 10
1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With