Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I aggregate a dataframe by customers ID considering a factor frequency?

Tags:

r

aggregate

I have a dataframe that records the purchases of different customers, identified by their "ID". Also, it records where he/she made each purchase, for example store #1 or store #2:

> head(data)
ID store
1    1
2    3
1    1
1    2
2    3
3    1
3    2

What I've been trying to do is to, for each customer, pick the store that he makes most of his/hers purchases. The output I'm looking for would be a dataframe that looks something like:

ID store
1   1
2   3
3   1

The customer with ID #3 made 2 purchases in different stores, it's irrelevant which one gets picked by the aggregate function. The ID number 1, however, made 3 purchases, 2 at store #1 and 1 at store #2, so I have to pick store #1.

I am struggling to find any kind of way to do that, but my approach is based on using the aggregate function

newdata <- aggregate(data$store,list(data$ID),FUN)

Is using the aggregate function the best way to do this? The problem I see here is which function to use as FUN. I have tried, without any success, to use a Mode function I found in a tutorial, and it is defined as:

Mode <- function(x) { ux <- unique(x) ux[which.max(tabulate(match(x, ux)))] }

Any thoughts/ideas?

Thanks,

Bernardo

like image 868
Bernardo Avatar asked Nov 01 '22 10:11

Bernardo


1 Answers

You may try this, basically building on the ideas that you started out with, using aggregate.

aggregate(store ~ ID, data = df, function(x){
  x[which.max(table(x))]
})

#   ID store
# 1  1     1
# 2  2     3
# 3  3     1
like image 94
Henrik Avatar answered Nov 04 '22 08:11

Henrik