Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to assign a counter to a specific subset of a data.frame which is defined by a factor combination?

My question is: I have a data frame with some factor variables. I now want to assign a new vector to this data frame, which creates an index for each subset of those factor variables.

   data <-data.frame(fac1=factor(rep(1:2,5)), fac2=sample(letters[1:3],10,rep=T))

Gives me something like:

        fac1 fac2
     1     1    a
     2     2    c
     3     1    b
     4     2    a
     5     1    c
     6     2    b
     7     1    a
     8     2    a
     9     1    b
     10    2    c

And what I want is a combination counter which counts the occurrence of each factor combination. Like this

        fac1 fac2  counter
     1     1    a        1
     2     2    c        1
     3     1    b        1
     4     2    a        1
     5     1    c        1
     6     2    b        1
     7     1    a        2
     8     2    a        2
     9     1    b        2
     10    1    a        3

So far I thought about using tapply to get the counter over all factor-combinations, which works fine

counter <-tapply(data$fac1, list(data$fac1,data$fac2), function(x) 1:length(x))

But I do not know how I can assign the counter list (e.g. unlisted) to the combinations in the data-frame without using inefficient looping :)

like image 748
JBJ Avatar asked Oct 25 '12 15:10

JBJ


2 Answers

This is a job for the ave() function:

# Use set.seed for reproducible examples 
#   when random number generation is involved
set.seed(1) 
myDF <- data.frame(fac1 = factor(rep(1:2, 7)), 
                   fac2 = sample(letters[1:3], 14, replace = TRUE), 
                   stringsAsFactors=FALSE)
myDF$counter <- ave(myDF$fac2, myDF$fac1, myDF$fac2, FUN = seq_along)
myDF
#    fac1 fac2 counter
# 1     1    a       1
# 2     2    b       1
# 3     1    b       1
# 4     2    c       1
# 5     1    a       2
# 6     2    c       2
# 7     1    c       1
# 8     2    b       2
# 9     1    b       2
# 10    2    a       1
# 11    1    a       3
# 12    2    a       2
# 13    1    c       2
# 14    2    b       3

Note the use of stringsAsFactors=FALSE in the data.frame() step. If you didn't have that, you can still get the output with: myDF$counter <- ave(as.character(myDF$fac2), myDF$fac1, myDF$fac2, FUN = seq_along).

like image 158
A5C1D2H2I1M1N2O1R2T1 Avatar answered Nov 12 '22 02:11

A5C1D2H2I1M1N2O1R2T1


A data.table solution

library(data.table)
DT <- data.table(data)
DT[, counter := seq_len(.N), by = list(fac1, fac2)]
like image 26
mnel Avatar answered Nov 12 '22 00:11

mnel