Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Running Count within groups in a dataframe [duplicate]

Tags:

r

dplyr

I have an Orders database for an online shopping platform.

The table I'm working with looks like this, where each line corresponds to one customer/item/date.

OrderHistory <- data.frame(date=c("2015-02-01", "2015-03-01", "2015-04-01", "2015-03-01", "2015-04-01", "2015-05-01", "2015-05-01"), 
            customer=c("A","A","A","B","B","B","B"),
            item=c("Candy", "Coffee", "Coffee", "Candy", "Candy", "Candy", "Coffee" ))

What I would like to get is a running count of the number of times each member has ordered the specific item so I can run analysis on which items are ordered repeatedly by the same customers and which ones are ordered once and never again.

The output would look like

out <- data.frame(date=c("2015-02-01", "2015-03-01", "2015-04-01", "2015-03-01", "2015-04-01", "2015-05-01", "2015-05-01"), 
              member=c("A","A","A","B","B","B","B"),
              item=c("Candy", "Coffee", "Coffee", "Candy", "Candy", "Candy", "Coffee" ),
              count=c(1,1,2,1,2,3,1))

I would love a dplyr solution but I'm open to any suggestions! The exact items on the platform are constantly changing, so the solution would have to be dynamic to account for that.

like image 500
SFuj Avatar asked May 27 '15 19:05

SFuj


2 Answers

I believe this should give you what you want

library(dplyr)
OrderHistory %>%
    group_by(customer, item) %>%
    mutate(count = seq(n()))

Source: local data frame [7 x 4]
Groups: customer, item

        date customer   item count
1 2015-02-01        A  Candy     1
2 2015-03-01        A Coffee     1
3 2015-04-01        A Coffee     2
4 2015-03-01        B  Candy     1
5 2015-04-01        B  Candy     2
6 2015-05-01        B  Candy     3
7 2015-05-01        B Coffee     1
like image 151
cdeterman Avatar answered Nov 07 '22 06:11

cdeterman


You've gotten your "dplyr" answer, but a straightforward way in base R is to use ave:

ave(rep(1, nrow(OrderHistory)), OrderHistory[-1], FUN = seq_along)
# [1] 1 1 2 1 2 3 1

You can also use getanID from my "splitstackshape" package:

library(splitstackshape)
getanID(OrderHistory, c("customer", "item"))
#          date customer   item .id
# 1: 2015-02-01        A  Candy   1
# 2: 2015-03-01        A Coffee   1
# 3: 2015-04-01        A Coffee   2
# 4: 2015-03-01        B  Candy   1
# 5: 2015-04-01        B  Candy   2
# 6: 2015-05-01        B  Candy   3
# 7: 2015-05-01        B Coffee   1
like image 35
A5C1D2H2I1M1N2O1R2T1 Avatar answered Nov 07 '22 06:11

A5C1D2H2I1M1N2O1R2T1