Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to choose the most common value in a group related to other group in R?

I have in R the following data frame:

ID = c(rep(1,5),rep(2,3),rep(3,2),rep(4,6));ID
VAR = c("A","A","A","A","B","C","C","D",
             "E","E","F","A","B","F","C","F");VAR
CATEGORY = c("ANE","ANE","ANA","ANB","ANE","BOO","BOA","BOO",
        "CAT","CAT","DOG","ANE","ANE","DOG","FUT","DOG");CATEGORY

DATA = data.frame(ID,VAR,CATEGORY);DATA

That looks like this table below :

ID VAR CATEGORY
1 A ANE
1 A ANE
1 A ANA
1 A ANB
1 B ANE
2 C BOO
2 C BOA
2 D BOO
3 E CAT
3 E CAT
4 F DOG
4 A ANE
4 B ANE
4 F DOG
4 C FUT
4 F DOG

ideal output given the above data frame in R I want to be like that:

ID TEXTS category
1 A ANE
2 C BOO
3 E CAT
4 F DOG

More specifically: I want for ID say 1 to search the most common value in the column VAR which is A and then to search the most common value in the column CATEGORY related to the most common value A which is the ANE and so forth.

How can I do it in R ? Imagine that it is sample example.My real data frame contains 850.000 rows and has 14000 unique ID.

like image 523
Homer Jay Simpson Avatar asked Dec 10 '22 23:12

Homer Jay Simpson


1 Answers

Another dplyr strategy using count and slice:

library(dplyr)
DATA %>% 
    group_by(ID) %>% 
    count(VAR, CATEGORY) %>% 
    slice(which.max(n)) %>% 
    select(-n)
     ID VAR   CATEGORY
  <dbl> <chr> <chr>   
1     1 A     ANE     
2     2 C     BOA     
3     3 E     CAT     
4     4 F     DOG  
like image 84
TarJae Avatar answered May 03 '23 09:05

TarJae