I recently ran into the following grouped operation: for each group, values are assigned evenly distributed numbers between -0.5 and 0.5, and if the group has only one element then it is assigned value 0. For instance, if I had the following observed groups:
g <- c("A", "A", "B", "B", "A", "C")
Then I would expect assigned values:
outcome <- c(-0.5, 0, -0.5, 0.5, 0.5, 0)
The three observations in group A were assigned values -0.5, 0, and 0.5 (in order), the two observations in group B were assigned values -0.5 and 0.5 (in order), and the one observation in group C was assigned value 0.
Normally when I perform a grouped operation on one vector to get another vector, I use the ave function, with the form ave(data.vector, group.vector, FUN=function.to.apply.to.each.groups.data.vector.subset). However in this operation all I need to know is the number of members in the group, so there is no data.vector. As a result, I ended up just making up a data vector that I ignored in my call to ave:
ave(rep(NA, length(g)), g, FUN=function(x) {
  if (length(x) == 1) {
    return(0)
  } else {
    return(seq(-0.5, 0.5, length=length(x)))
  }
})
# [1] -0.5  0.0 -0.5  0.5  0.5  0.0
While this gives me the correct answer, it's obviously pretty unsatisfying to need to make up a data vector that I then ignore. Is there a better way to assign values by group when all that matters is the number of elements in the group?
From the comments it doesn't seem like there's a version of ave that takes just the group and a function that is called with the number of elements in each group. I suppose this is not particularly surprising since it's a pretty specialized operation.
If I had to do this frequently I could roll my own version of ave with the desired properties as a thin wrapper around ave:
ave.len <- function(..., FUN) {
  l <- list(...)
  do.call("ave", c(list(x=rep(NA, length(l[[1]]))), l, FUN=function(x) FUN(length(x))))
}
# Original operation, using @akrun's 1-line command for sequences
g <- c("A", "A", "B", "B", "A", "C")
ave.len(g, FUN=function(n) seq(-0.5, 0.5, length=n)* (n!=1)+0L)
# [1] -0.5  0.0 -0.5  0.5  0.5  0.0
# Group of size n has the n^th letter in the alphabet
ave.len(g, FUN=function(n) rep(letters[n], n))
# [1] "c" "c" "b" "b" "c" "a"
# Multiple groups via the ... argument (here everything's in own group)
ave.len(g, 1:6, FUN=function(n) rep(letters[n], n))
# [1] "a" "a" "a" "a" "a" "a"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With