Conditional filter of grouped factors - dplyr

Question

Say I have this sort of dataframe:

   day value group type id
1    1   0.1     A    X  1
2    1   0.4     A    Y  1
3    2   0.2     A    X  3
4    2   0.5     A    Y  3
5    3   0.3     A    X  5
6    3   0.2     A    Y  6
7    1   0.1     B    X  3
8    1   0.3     B    Y  3
9    2   0.1     B    X 11
10   2   0.4     B    Y 10
11   3   0.2     B    X 12
12   3   0.3     B    Y 12
13   1   0.1     C    X 12
14   1   0.3     C    Y 12
15   2   0.3     C    X  5
16   2   0.2     C    Y  5
17   3   0.2     C    X  3
18   3   0.2     C    Y  2

Data:

library(dplyr)
df1 <- data.frame(
day = rep(1:3,6),
value = c(0.1,0.2,0.3,0.4,0.5,0.2,0.1,0.1,0.2,0.3,0.4,0.3, 0.1,0.3,0.2,0.3,0.2,0.2),
group = rep(LETTERS[1:3], each=6)
) %>% 
  arrange(group,day) %>% 
  mutate(type=rep(LETTERS[24:25],9),
         id = c(1,1,3,3,5,6,3,3,11,10,12,12,12,12,5,5,3,2))  

df1

I want to filter this dataframe based upon a conditional filter. I want to group_by(day, group) and if all id within each grouping are all equal, I want to filter out all rows of type Y but keep rows of type X.

I can do this by running a loop or through several steps of dataframe subsetting, but I was wondering if there is a one/two liner in dplyr or data.table that I am somehow overlooking.

This would be the desired output:

   day value group type id
1    1   0.1     A    X  1
3    2   0.2     A    X  3
5    3   0.3     A    X  5
6    3   0.2     A    Y  6
7    1   0.1     B    X  3
9    2   0.1     B    X 11
10   2   0.4     B    Y 10
11   3   0.2     B    X 12
13   1   0.1     C    X 12
15   2   0.3     C    X  5
17   3   0.2     C    X  3
18   3   0.2     C    Y  2

akrun · Accepted Answer

Here is an one-liner with data.table.

We convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'day', 'group', if the length of unique elements of 'id' is 1, we get the Subset of Data.table (.SD) rows where 'type' is 'X' or else get the .SD.

library(data.table)#v1.9.6+
setDT(df1)[, if(uniqueN(id)==1) .SD[type=='X'] else .SD, .(day, group)]
#    day group value type id
# 1:   1     A   0.1    X  1
# 2:   2     A   0.2    X  3
# 3:   3     A   0.3    X  5
# 4:   3     A   0.2    Y  6
# 5:   1     B   0.1    X  3
# 6:   2     B   0.1    X 11
# 7:   2     B   0.4    Y 10
# 8:   3     B   0.2    X 12
# 9:   1     C   0.1    X 12
#10:   2     C   0.3    X  5
#11:   3     C   0.2    X  3
#12:   3     C   0.2    Y  2

Or if the 'type' is already ordered as in the example data

unique(setDT(df1), by = c('day', 'group', 'id'))

If not ordered,

unique(setDT(df1)[order(group,day, id, type)],by = c('day', 'group' , 'id'))

data

df1 <- structure(list(day = c(1L, 1L, 2L, 2L, 3L, 3L, 1L, 1L, 
2L, 2L, 
3L, 3L, 1L, 1L, 2L, 2L, 3L, 3L), value = c(0.1, 0.4, 0.2, 0.5, 
0.3, 0.2, 0.1, 0.3, 0.1, 0.4, 0.2, 0.3, 0.1, 0.3, 0.3, 0.2, 0.2, 
0.2), group = c("A", "A", "A", "A", "A", "A", "B", "B", "B", 
"B", "B", "B", "C", "C", "C", "C", "C", "C"), type = c("X", "Y", 
"X", "Y", "X", "Y", "X", "Y", "X", "Y", "X", "Y", "X", "Y", "X", 
"Y", "X", "Y"), id = c(1L, 1L, 3L, 3L, 5L, 6L, 3L, 3L, 11L, 10L, 
12L, 12L, 12L, 12L, 5L, 5L, 3L, 2L)), .Names = c("day", "value", 
"group", "type", "id"), class = "data.frame", 
row.names = c(NA, -18L))

jazzurro · Answer

Similar to P Lapointe, I had the following. I initially wanted to use all() to check if all id values are identical or not, but that attempt failed. So, I chose to use diff(). Using mutate() I checked if all id values are identical or not for each group. Then, I chose rows which does not have the combination of check == TRUE and type == "Y". Finally, I removed the check column.

group_by(df1, day, group) %>%
mutate(check = any(diff(id) == 0)) %>%
filter(!(check == TRUE & type == "Y")) %>%
select(-check)

#     day value  group  type    id
#   (int) (dbl) (fctr) (chr) (dbl)
#1      1   0.1      A     X     1
#2      2   0.2      A     X     3
#3      3   0.3      A     X     5
#4      3   0.2      A     Y     6
#5      1   0.1      B     X     3
#6      2   0.1      B     X    11
#7      2   0.4      B     Y    10
#8      3   0.2      B     X    12
#9      1   0.1      C     X    12
#10     2   0.3      C     X     5
#11     3   0.2      C     X     3
#12     3   0.2      C     Y     2

EDIT

After communicating with akrun, I revised the code above. Here it is.

group_by(df1, day, group) %>%
mutate(check = n_distinct(id) == 1) %>%
filter(!(check == TRUE & type == "Y")) %>%
select(-check)

Conditional filter of grouped factors - dplyr

Tags:

r

group-by

dplyr

jalapic

2 Answers

data

akrun

jazzurro

Recent Activity

Donate For Us

Conditional filter of grouped factors - dplyr

Tags:

r

group-by

dplyr

jalapic

2 Answers

data

akrun

jazzurro

Related questions

Recent Activity

Donate For Us