I would like to find the most common values in a column in a data frame. I assume using table would be the best way to do this? I then want to filter/subset my data frame to only include these top-n values.
An example of my data frame is as follows. Here I want to find e.g. the top 2 IDs.
ID col
A blue
A purple
A green
B green
B red
C red
C blue
C yellow
C orange
I therefore want to output the following:
Top 2 values of ID are:
A and C
I will then select the rows corresponding to ID A and C:
ID col
A blue
A purple
A green
C red
C blue
C yellow
C orange
You can try a tidyverse. Add the counts of ID's, then filter for the top two (using < 3) or top ten (using < 11):
library(tidyverse)
d %>%
add_count(ID) %>%
filter(dense_rank(-n) < 3)
# A tibble: 7 x 3
ID col n
<fct> <fct> <int>
1 A blue 3
2 A purple 3
3 A green 3
4 C red 4
5 C blue 4
6 C yellow 4
7 C orange 4
Data
d <- read.table(text="ID col
A blue
A purple
A green
B green
B red
C red
C blue
C yellow
C orange", header=T)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With