I have a dataframe that records the purchases of different customers, identified by their "ID". Also, it records where he/she made each purchase, for example store #1 or store #2:
> head(data)
ID store
1 1
2 3
1 1
1 2
2 3
3 1
3 2
What I've been trying to do is to, for each customer, pick the store that he makes most of his/hers purchases. The output I'm looking for would be a dataframe that looks something like:
ID store
1 1
2 3
3 1
The customer with ID #3 made 2 purchases in different stores, it's irrelevant which one gets picked by the aggregate function. The ID number 1, however, made 3 purchases, 2 at store #1 and 1 at store #2, so I have to pick store #1.
I am struggling to find any kind of way to do that, but my approach is based on using the aggregate function
newdata <- aggregate(data$store,list(data$ID),FUN)
Is using the aggregate function the best way to do this? The problem I see here is which function to use as FUN. I have tried, without any success, to use a Mode function I found in a tutorial, and it is defined as:
Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
Any thoughts/ideas?
Thanks,
Bernardo
You may try this, basically building on the ideas that you started out with, using aggregate
.
aggregate(store ~ ID, data = df, function(x){
x[which.max(table(x))]
})
# ID store
# 1 1 1
# 2 2 3
# 3 3 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With