I have in R the following data frame:
ID = c(rep(1,5),rep(2,3),rep(3,2),rep(4,6));ID
VAR = c("A","A","A","A","B","C","C","D",
"E","E","F","A","B","F","C","F");VAR
CATEGORY = c("ANE","ANE","ANA","ANB","ANE","BOO","BOA","BOO",
"CAT","CAT","DOG","ANE","ANE","DOG","FUT","DOG");CATEGORY
DATA = data.frame(ID,VAR,CATEGORY);DATA
That looks like this table below :
ID | VAR | CATEGORY |
---|---|---|
1 | A | ANE |
1 | A | ANE |
1 | A | ANA |
1 | A | ANB |
1 | B | ANE |
2 | C | BOO |
2 | C | BOA |
2 | D | BOO |
3 | E | CAT |
3 | E | CAT |
4 | F | DOG |
4 | A | ANE |
4 | B | ANE |
4 | F | DOG |
4 | C | FUT |
4 | F | DOG |
ideal output given the above data frame in R I want to be like that:
ID | TEXTS | category |
---|---|---|
1 | A | ANE |
2 | C | BOO |
3 | E | CAT |
4 | F | DOG |
More specifically: I want for ID say 1 to search the most common value in the column VAR which is A and then to search the most common value in the column CATEGORY related to the most common value A which is the ANE and so forth.
How can I do it in R ? Imagine that it is sample example.My real data frame contains 850.000 rows and has 14000 unique ID.
Another dplyr
strategy using count
and slice
:
library(dplyr)
DATA %>%
group_by(ID) %>%
count(VAR, CATEGORY) %>%
slice(which.max(n)) %>%
select(-n)
ID VAR CATEGORY
<dbl> <chr> <chr>
1 1 A ANE
2 2 C BOA
3 3 E CAT
4 4 F DOG
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With