I would like to remove smaller groups using dplyr
. For example, the dataframe:
ID group value
1 1 6
2 1 2
3 2 0
4 2 5
5 2 3
6 3 7
7 3 1
8 4 3
9 4 7
10 4 5
Group size of group 1, group 2, group 3, and group 4 are 2, 3, 2 and 3, and I want to remove the group 1 and group 3 since their size are less than 3. Thank you in advance!
You can use n()
to get the number of rows per group, and filter on it, take a look at ?n()
, the last example about the usage of n()
is filtering based on the size of groups:
df %>% group_by(group) %>% filter(n() >= 3)
# Source: local data frame [6 x 3]
# Groups: group [2]
# ID group value
# <int> <int> <int>
# 1 3 2 0
# 2 4 2 5
# 3 5 2 3
# 4 8 4 3
# 5 9 4 7
# 6 10 4 5
df %>% group_by(group) %>% mutate(n=n()) %>% ungroup %>% filter(n!=min(n)) %>% select(-n)
We can also use data.table
. Convert the 'data.frame' to 'data.table' (setDT(df1)
), grouped by 'group', if
the number of observations within a group (.N
) is greater than 2, get the Subset of Data.table
library(data.table)
setDT(df1)[, if(.N >2 ) .SD, by = group]
# group ID value
#1: 2 3 0
#2: 2 4 5
#3: 2 5 3
#4: 4 8 3
#5: 4 9 7
#6: 4 10 5
Or with base R
tbl <- table(df1$group)> 2
subset(df1, group %in% names(tbl)[tbl])
# ID group value
#3 3 2 0
#4 4 2 5
#5 5 2 3
#8 8 4 3
#9 9 4 7
#10 10 4 5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With