Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R count occurrences of an element by groups [duplicate]

Tags:

r

counter

What is the easiest way to count the occurrences of a an element on a vector or data.frame at every grouop?
I don't mean just counting the total (as other stackoverflow questions ask) but giving a different number to every succesive occurence.

for example for this simple dataframe: (but I will work with dataframes with more columns)

mydata <- data.frame(A=c("A","A","A","B","B","A", "A"))

I've found this solution:

cbind(mydata,myorder=ave(rep(1,nrow(mydata)),mydata$A, FUN=cumsum))   

and here the result:

 A myorder  
 A       1  
 A       2  
 A       3  
 B       1  
 B       2  
 A       4  
 A       5  

Isn't there any single command to do it?. Or using an specialized package?

I want it to later use tidyr's spread() function.

My question is not the same than Is there an aggregate FUN option to count occurrences? because I don't want to know the total number of occurrencies at the end but the cumulative occurencies till every element.

OK, my problem is a little bit more complex

mydata <- data.frame(group=c("x","x","x","x","y","y", "y"), letter=c("A","A","A","B","B","A", "A"))

I only know to solve the first example I wrote above. But what happens when I want it also by a second grouping variable? something like occurrencies(letter) by group.

group letter  "occurencies within group"  
 x      A       1  
 x      A       2  
 x      A       3  
 x      B       1  
 y      B       1  
 y      A       1  
 y      A       2  

I've found the way with

ave(rep(1,nrow(mydata)),list(mydata$group, mydata$letter), FUN=cumsum)
though it shoould be something easier.

like image 786
skan Avatar asked Sep 15 '15 12:09

skan


1 Answers

Using data.table

library(data.table)
setDT(mydata)
mydata[, myorder := 1:.N, by = .(group, letter)]

The by argument makes the table be dealt with within the groups of the column called A. .N is the number of rows within that group (if the by argument was empty it would be the number of rows in the table), so for each sub-table, each row is indexed from 1 to the number of rows in that sub-table.

mydata
   group letter myorder
1:     x      A       1
2:     x      A       2
3:     x      A       3
4:     x      B       1
5:     y      B       1
6:     y      A       1
7:     y      A       2

or a dplyr solution which is pretty much the same

mydata %>% 
  group_by(group, letter) %>% 
  mutate(myorder = 1:n())
like image 133
Akhil Nair Avatar answered Sep 23 '22 10:09

Akhil Nair