Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Conditional filter of grouped factors - dplyr

Tags:

r

group-by

dplyr

Say I have this sort of dataframe:

   day value group type id
1    1   0.1     A    X  1
2    1   0.4     A    Y  1
3    2   0.2     A    X  3
4    2   0.5     A    Y  3
5    3   0.3     A    X  5
6    3   0.2     A    Y  6
7    1   0.1     B    X  3
8    1   0.3     B    Y  3
9    2   0.1     B    X 11
10   2   0.4     B    Y 10
11   3   0.2     B    X 12
12   3   0.3     B    Y 12
13   1   0.1     C    X 12
14   1   0.3     C    Y 12
15   2   0.3     C    X  5
16   2   0.2     C    Y  5
17   3   0.2     C    X  3
18   3   0.2     C    Y  2

Data:

library(dplyr)
df1 <- data.frame(
day = rep(1:3,6),
value = c(0.1,0.2,0.3,0.4,0.5,0.2,0.1,0.1,0.2,0.3,0.4,0.3, 0.1,0.3,0.2,0.3,0.2,0.2),
group = rep(LETTERS[1:3], each=6)
) %>% 
  arrange(group,day) %>% 
  mutate(type=rep(LETTERS[24:25],9),
         id = c(1,1,3,3,5,6,3,3,11,10,12,12,12,12,5,5,3,2))  

df1

I want to filter this dataframe based upon a conditional filter. I want to group_by(day, group) and if all id within each grouping are all equal, I want to filter out all rows of type Y but keep rows of type X.

I can do this by running a loop or through several steps of dataframe subsetting, but I was wondering if there is a one/two liner in dplyr or data.table that I am somehow overlooking.

This would be the desired output:

   day value group type id
1    1   0.1     A    X  1
3    2   0.2     A    X  3
5    3   0.3     A    X  5
6    3   0.2     A    Y  6
7    1   0.1     B    X  3
9    2   0.1     B    X 11
10   2   0.4     B    Y 10
11   3   0.2     B    X 12
13   1   0.1     C    X 12
15   2   0.3     C    X  5
17   3   0.2     C    X  3
18   3   0.2     C    Y  2
like image 254
jalapic Avatar asked Feb 08 '23 13:02

jalapic


2 Answers

Here is an one-liner with data.table.

We convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'day', 'group', if the length of unique elements of 'id' is 1, we get the Subset of Data.table (.SD) rows where 'type' is 'X' or else get the .SD.

library(data.table)#v1.9.6+
setDT(df1)[, if(uniqueN(id)==1) .SD[type=='X'] else .SD, .(day, group)]
#    day group value type id
# 1:   1     A   0.1    X  1
# 2:   2     A   0.2    X  3
# 3:   3     A   0.3    X  5
# 4:   3     A   0.2    Y  6
# 5:   1     B   0.1    X  3
# 6:   2     B   0.1    X 11
# 7:   2     B   0.4    Y 10
# 8:   3     B   0.2    X 12
# 9:   1     C   0.1    X 12
#10:   2     C   0.3    X  5
#11:   3     C   0.2    X  3
#12:   3     C   0.2    Y  2

Or if the 'type' is already ordered as in the example data

unique(setDT(df1), by = c('day', 'group', 'id'))

If not ordered,

unique(setDT(df1)[order(group,day, id, type)],by = c('day', 'group' , 'id'))

data

df1 <- structure(list(day = c(1L, 1L, 2L, 2L, 3L, 3L, 1L, 1L, 
2L, 2L, 
3L, 3L, 1L, 1L, 2L, 2L, 3L, 3L), value = c(0.1, 0.4, 0.2, 0.5, 
0.3, 0.2, 0.1, 0.3, 0.1, 0.4, 0.2, 0.3, 0.1, 0.3, 0.3, 0.2, 0.2, 
0.2), group = c("A", "A", "A", "A", "A", "A", "B", "B", "B", 
"B", "B", "B", "C", "C", "C", "C", "C", "C"), type = c("X", "Y", 
"X", "Y", "X", "Y", "X", "Y", "X", "Y", "X", "Y", "X", "Y", "X", 
"Y", "X", "Y"), id = c(1L, 1L, 3L, 3L, 5L, 6L, 3L, 3L, 11L, 10L, 
12L, 12L, 12L, 12L, 5L, 5L, 3L, 2L)), .Names = c("day", "value", 
"group", "type", "id"), class = "data.frame", 
row.names = c(NA, -18L))
like image 124
akrun Avatar answered Feb 16 '23 04:02

akrun


Similar to P Lapointe, I had the following. I initially wanted to use all() to check if all id values are identical or not, but that attempt failed. So, I chose to use diff(). Using mutate() I checked if all id values are identical or not for each group. Then, I chose rows which does not have the combination of check == TRUE and type == "Y". Finally, I removed the check column.

group_by(df1, day, group) %>%
mutate(check = any(diff(id) == 0)) %>%
filter(!(check == TRUE & type == "Y")) %>%
select(-check)

#     day value  group  type    id
#   (int) (dbl) (fctr) (chr) (dbl)
#1      1   0.1      A     X     1
#2      2   0.2      A     X     3
#3      3   0.3      A     X     5
#4      3   0.2      A     Y     6
#5      1   0.1      B     X     3
#6      2   0.1      B     X    11
#7      2   0.4      B     Y    10
#8      3   0.2      B     X    12
#9      1   0.1      C     X    12
#10     2   0.3      C     X     5
#11     3   0.2      C     X     3
#12     3   0.2      C     Y     2

EDIT

After communicating with akrun, I revised the code above. Here it is.

group_by(df1, day, group) %>%
mutate(check = n_distinct(id) == 1) %>%
filter(!(check == TRUE & type == "Y")) %>%
select(-check)
like image 42
jazzurro Avatar answered Feb 16 '23 04:02

jazzurro