Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R dplyr: how to remove smaller groups?

Tags:

r

dplyr

I would like to remove smaller groups using dplyr. For example, the dataframe:

ID group value
1    1     6
2    1     2
3    2     0
4    2     5
5    2     3
6    3     7
7    3     1
8    4     3
9    4     7
10   4     5

Group size of group 1, group 2, group 3, and group 4 are 2, 3, 2 and 3, and I want to remove the group 1 and group 3 since their size are less than 3. Thank you in advance!

like image 494
just_rookie Avatar asked Sep 03 '16 01:09

just_rookie


3 Answers

You can use n() to get the number of rows per group, and filter on it, take a look at ?n(), the last example about the usage of n() is filtering based on the size of groups:

df %>% group_by(group) %>% filter(n() >= 3)

# Source: local data frame [6 x 3]
# Groups: group [2]

#      ID group value
#   <int> <int> <int>
# 1     3     2     0
# 2     4     2     5
# 3     5     2     3
# 4     8     4     3
# 5     9     4     7
# 6    10     4     5
like image 149
Psidom Avatar answered Oct 24 '22 00:10

Psidom


df %>% group_by(group) %>% mutate(n=n()) %>% ungroup %>% filter(n!=min(n)) %>% select(-n)
like image 1
Shenglin Chen Avatar answered Oct 24 '22 00:10

Shenglin Chen


We can also use data.table. Convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'group', if the number of observations within a group (.N) is greater than 2, get the Subset of Data.table

library(data.table)
setDT(df1)[, if(.N >2 ) .SD,  by = group]
#    group ID value
#1:     2  3     0
#2:     2  4     5
#3:     2  5     3
#4:     4  8     3
#5:     4  9     7
#6:     4 10     5

Or with base R

tbl <- table(df1$group)> 2
subset(df1, group %in% names(tbl)[tbl])
#    ID group value
#3   3     2     0
#4   4     2     5
#5   5     2     3
#8   8     4     3
#9   9     4     7
#10 10     4     5
like image 1
akrun Avatar answered Oct 24 '22 01:10

akrun