I have a data set of this format
User
1
2
3
2
3
1
1
Now I want to add a column saying count which counts the occurrence of the user. I want output in the below format.
User Count
1 1
2 1
3 1
2 2
3 2
1 2
1 3
I have few solutions but all those solutions are somewhat slow.
Running count variable in R
My data.frame has 100,000 rows now and soon it may go up to 1 million. I need a solution which is also fast.
An option using dplyr
library(dplyr)
df1 %>%
group_by(User) %>%
mutate(Count=row_number())
# User Count
#1 1 1
#2 2 1
#3 3 1
#4 2 2
#5 3 2
#6 1 2
#7 1 3
Using sqldf
library(sqldf)
sqldf('select a.*,
count(*) as Count
from df1 a, df1 b
where a.User = b.User and b.rowid <= a.rowid
group by a.rowid')
# User Count
#1 1 1
#2 2 1
#3 3 1
#4 2 2
#5 3 2
#6 1 2
#7 1 3
This is fairly easy with ave
and seq.int
:
> ave(User,User, FUN= seq.int)
[1] 1 1 1 2 2 2 3
This is a common strategy and is often used when the items are adjacent to each other. The second argument is the grouping variable and in this case the first argument is really kind of a dummy argument since the only thing that it contributes is a length, and it is not a requirement for ave
to have adjacent rows for the values determined within groupings.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With