Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R's tapply with null function

Tags:

r

tapply

I'm having trouble understanding what tapply function does when the FUN argument is null.

The documentation says:

If FUN is NULL, tapply returns a vector which can be used to subscript the multi-way array tapply normally produces.

For example, what does the following example of the documentation do?

ind <- list(c(1, 2, 2), c("A", "A", "B"))
tapply(1:3, ind) #-> the split vector

I don't understand the results:

[1] 1 2 4

Thanks.

like image 222
Carmellose Avatar asked Oct 19 '22 09:10

Carmellose


1 Answers

If you run tapply with a specified function (not NULL), say sum, like in help, you'll see that the result is a 2-dimensional array with NA in one cell:

res <- tapply(1:3, ind, sum)
res
   A  B
 1 1 NA
 2 2  3

It means that one combination of factors, namely (1, B), is absent. When FUN is NULL, it returns a vector indices corresponding to all present factor combinations. To check this:

> which(!is.na(res))
[1] 1 2 4

One thing to mention, the specified function can return NA's itself, like in the following toy example:

> f <- function(x){
      if(x[[1]] == 1) return(NA)
      return(sum(x))
  }
> tapply(1:3, ind, f)
   A  B
1 NA NA
2  2  3

So, in general, NA doesn't mean that a factor combination is absent.

like image 108
Iaroslav Domin Avatar answered Oct 21 '22 01:10

Iaroslav Domin