How can I get the most common combination of several columns, aggregating by others, in a data.frame?

Question

Let's say I have a dataframe with the following structure:

I'd like to get the most common combination of values in A and B for each id:

id  A  B
 1  1  2
 2  3  5

I need to do this for a fairly big dataset (several million rows). I've got to a couple of horrible, slow, and very un-idiomatic solutions; I'm sure there is an easy, R-ish way.

I think I should be using aggregate, but I can't find a way to do it that works:

> aggregate(cbind(A, B) ~ id, d, Mode)
  id A B
1  1 2 2
2  2 3 2  
> # wrong!

> aggregate(interaction(A, B) ~ id, d, Mode)
id interaction(A, B)
1  1               1.2
2  2               3.5
> # close, but I need the original columns

talat · Accepted Answer

Using dplyr:

library(dplyr)
df %>% 
  group_by(id, A, B) %>%
  mutate(n = n()) %>%
  group_by(id) %>%
  slice(which.max(n)) %>%
  select(-n)

#Source: local data frame [2 x 3]
#Groups: id
#
#  id A B
#1  1 1 2
#2  2 3 5

And a similar data.table approach:

library(data.table)
setDT(df)[, .N, by=.(id, A, B)][, .SD[which.max(N)], by = id]
#   id A B N
#1:  1 1 2 2
#2:  2 3 5 2

Edit to include a brief explanation:

Both approaches do essentially the same:

group the data by id, A and B.
Add a column with the number of rows per group
group the data by id (only) and return the (first) maximum group per id.

In the data.table version, you start with setDT(df) to convert the data.frame to a data.table object.

How can I get the most common combination of several columns, aggregating by others, in a data.frame?

Tags:

dataframe

r

jesusiniesta

1 Answers

talat

Recent Activity

Donate For Us

How can I get the most common combination of several columns, aggregating by others, in a data.frame?

Tags:

dataframe

r

jesusiniesta

1 Answers

talat

Related questions

Recent Activity

Donate For Us