Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R apply error: 'X' must have named dimnames

Tags:

dataframe

r

The "apply" documentation mentions that "Where 'X' has named dimnames, it can be a character vector selecting dimension names." I would like to use apply on a data.frame for only particular columns. Can I use the dimnames feature to do this?

I realize I can subset() X to only include the columns of interest, but I want to understand "named dimnames" better.

Below is some sample code:

> x <-  data.frame(cbind(1,1:10))
> apply(x,2,sum)
X1 X2
10 55
> apply(x,c('X2'),sum)
Error in apply(x, c("X2"), sum) : 'X' must have named dimnames
> dimnames(x)
[[1]]
 [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10"

[[2]]
[1] "X1" "X2"
> names(x)
[1] "X1" "X2"
> names(dimnames(x))
NULL
like image 630
patrickmdnet Avatar asked Aug 05 '11 14:08

patrickmdnet


2 Answers

If I understand you correctly, you would like to use apply only on certain columns. This is not what named dimnames would accomplish. The apply function on a matrix or data.frame always applies to all the rows or all the columns. The named dimnames allows you to choose to use rows or columns by name instead of the "normal" 1 and 2:

m <- matrix(1:12,4, dimnames=list(foo=letters[1:4], bar=LETTERS[1:3]))
apply(m, "bar", sum)  # Use "bar" instead of 2 to refer to the columns

However if you have the column names you'd like to apply to, you could do it by first selecting only those columns:

n <- c("A","C")
apply(m[,n], 2, sum)
# A  C 
#10 42 

Named dimnames is a side-effect of that dimnames are stored as a list in the "dimnames" attribute in a matrix or array. Each component of the list corresponds to one dimension and can be named. This is probably more useful for multidimensional arrays...

For a data.frame, there is no "dimnames" attribute. A data.frame is essentially a list, so the list's "names" attributes corresponds to the column names, and an extra "row.names" attribute corresponds to the row names. Because of this, there is no place to store the names of the dimnames (they could have an extra attribute for that of course, but they didn't). When you call the dimnames function on a data.frame, it simply creates a list from the "row.names" and "names" attributes.

like image 136
Tommy Avatar answered Oct 31 '22 01:10

Tommy


The issue is that you can't manipulate the dimnames of x directly for some reason, and x will be coerced to a matrix which isn't preserving named dimnames.

A solution is to coerce to a matrix first, then name the dimnames and then use apply()

> X <- as.matrix(x)
> str(X)
 num [1:10, 1:2] 1 1 1 1 1 1 1 1 1 1 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:10] "1" "2" "3" "4" ...
  ..$ : chr [1:2] "X1" "X2"
> dimnames(X) <- list(C1 = dimnames(x)[[1]], C2 = dimnames(x)[[2]])
> str(X)
 num [1:10, 1:2] 1 1 1 1 1 1 1 1 1 1 ...
 - attr(*, "dimnames")=List of 2
  ..$ C1: chr [1:10] "1" "2" "3" "4" ...
  ..$ C2: chr [1:2] "X1" "X2"
> apply(X, "C1", mean)
  1   2   3   4   5   6   7   8   9  10 
1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 
> rowMeans(X)
      1   2   3   4   5   6   7   8   9  10 
1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5
like image 37
Gavin Simpson Avatar answered Oct 31 '22 00:10

Gavin Simpson