I'm having trouble understanding what tapply
function does when the FUN
argument is null
.
The documentation says:
If FUN is NULL, tapply returns a vector which can be used to subscript the multi-way array tapply normally produces.
For example, what does the following example of the documentation do?
ind <- list(c(1, 2, 2), c("A", "A", "B"))
tapply(1:3, ind) #-> the split vector
I don't understand the results:
[1] 1 2 4
Thanks.
If you run tapply
with a specified function (not NULL), say sum
, like in help, you'll see that the result is a 2-dimensional array with NA
in one cell:
res <- tapply(1:3, ind, sum)
res
A B
1 1 NA
2 2 3
It means that one combination of factors, namely (1, B), is absent. When FUN is NULL, it returns a vector indices corresponding to all present factor combinations. To check this:
> which(!is.na(res))
[1] 1 2 4
One thing to mention, the specified function can return NA's itself, like in the following toy example:
> f <- function(x){
if(x[[1]] == 1) return(NA)
return(sum(x))
}
> tapply(1:3, ind, f)
A B
1 NA NA
2 2 3
So, in general, NA doesn't mean that a factor combination is absent.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With