For the example data:
set.seed(2222)
example_data <- data.frame(col1 = 1:15,
col2 = 16:30,
group = sample(1:3, 15, replace = TRUE))
col1 col2 group
1 1 16 2
2 2 17 1
3 3 18 3
4 4 19 2
5 5 20 3
6 6 21 1
7 7 22 3
8 8 23 1
9 9 24 3
10 10 25 1
11 11 26 2
12 12 27 2
13 13 28 2
14 14 29 3
15 15 30 3
I want to find the top n groups with the most number of records.
Let's say I want to get the top 2 groups with the most number of records. In the data, this would be group 3 and 2:
example_data %>%
group_by(group) %>%
summarise(n = n())
# A tibble: 3 x 2
group n
<int> <int>
1 1 4
2 2 5
3 3 6
The expected output is:
col1 col2 group
1 1 16 2
2 3 18 3
3 4 19 2
4 5 20 3
5 7 22 3
6 9 24 3
7 11 26 2
8 12 27 2
9 13 28 2
10 14 29 3
11 15 30 3
Let's say, you want to select the first 10 rows. The easiest way to do it would be data[1:10, ] .
To get the top values in an R data frame, we can use the head function and if we want the values in decreasing order then sort function will be required. Therefore, we need to use the combination of head and sort function to find the top values in decreasing order.
pandas DataFrame. head() method is used to get the top or bottom N rows of the DataFrame.
We can use table
to calculate frequency for each group
, sort
them in decreasing
order, subset the top 2 entries and filter
the respective groups.
library(dplyr)
example_data %>%
filter(group %in% names(sort(table(group), decreasing = TRUE)[1:2]))
# col1 col2 group
#1 1 16 2
#2 3 18 3
#3 4 19 2
#4 5 20 3
#5 7 22 3
#6 9 24 3
#7 11 26 2
#8 12 27 2
#9 13 28 2
#10 14 29 3
#11 15 30 3
Also you can directly use this in base R subset
subset(example_data, group %in% names(sort(table(group), decreasing = TRUE)[1:2]))
We can use tidyverse
methods for this. Create a frequency column with add_count
, arrange
by that column and filter
the rows where the 'group' is in the last two unique
'group' values
library(dplyr)
example_data %>%
add_count(group) %>%
arrange(n) %>%
filter(group %in% tail(unique(group), 2)) %>%
select(-n)
# A tibble: 11 x 3
# col1 col2 group
# <int> <int> <int>
# 1 1 16 2
# 2 4 19 2
# 3 11 26 2
# 4 12 27 2
# 5 13 28 2
# 6 3 18 3
# 7 5 20 3
# 8 7 22 3
# 9 9 24 3
#10 14 29 3
#11 15 30 3
Or using data.table
library(data.table)
setDT(example_data)[group %in% example_data[, .N, group][order(-N), head(group, 2)]]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With