Say I have this sort of dataframe:
day value group type id
1 1 0.1 A X 1
2 1 0.4 A Y 1
3 2 0.2 A X 3
4 2 0.5 A Y 3
5 3 0.3 A X 5
6 3 0.2 A Y 6
7 1 0.1 B X 3
8 1 0.3 B Y 3
9 2 0.1 B X 11
10 2 0.4 B Y 10
11 3 0.2 B X 12
12 3 0.3 B Y 12
13 1 0.1 C X 12
14 1 0.3 C Y 12
15 2 0.3 C X 5
16 2 0.2 C Y 5
17 3 0.2 C X 3
18 3 0.2 C Y 2
Data:
library(dplyr)
df1 <- data.frame(
day = rep(1:3,6),
value = c(0.1,0.2,0.3,0.4,0.5,0.2,0.1,0.1,0.2,0.3,0.4,0.3, 0.1,0.3,0.2,0.3,0.2,0.2),
group = rep(LETTERS[1:3], each=6)
) %>%
arrange(group,day) %>%
mutate(type=rep(LETTERS[24:25],9),
id = c(1,1,3,3,5,6,3,3,11,10,12,12,12,12,5,5,3,2))
df1
I want to filter this dataframe based upon a conditional filter. I want to group_by(day, group)
and if all id
within each grouping are all equal, I want to filter
out all rows of type Y but keep rows of type X.
I can do this by running a loop or through several steps of dataframe subsetting, but I was wondering if there is a one/two liner in dplyr
or data.table
that I am somehow overlooking.
This would be the desired output:
day value group type id
1 1 0.1 A X 1
3 2 0.2 A X 3
5 3 0.3 A X 5
6 3 0.2 A Y 6
7 1 0.1 B X 3
9 2 0.1 B X 11
10 2 0.4 B Y 10
11 3 0.2 B X 12
13 1 0.1 C X 12
15 2 0.3 C X 5
17 3 0.2 C X 3
18 3 0.2 C Y 2
Here is an one-liner with data.table
.
We convert the 'data.frame' to 'data.table' (setDT(df1)
), grouped by 'day', 'group', if
the length
of unique
elements of 'id' is 1, we get the Subset of Data.table (.SD
) rows where 'type' is 'X' or else
get the .SD
.
library(data.table)#v1.9.6+
setDT(df1)[, if(uniqueN(id)==1) .SD[type=='X'] else .SD, .(day, group)]
# day group value type id
# 1: 1 A 0.1 X 1
# 2: 2 A 0.2 X 3
# 3: 3 A 0.3 X 5
# 4: 3 A 0.2 Y 6
# 5: 1 B 0.1 X 3
# 6: 2 B 0.1 X 11
# 7: 2 B 0.4 Y 10
# 8: 3 B 0.2 X 12
# 9: 1 C 0.1 X 12
#10: 2 C 0.3 X 5
#11: 3 C 0.2 X 3
#12: 3 C 0.2 Y 2
Or if the 'type' is already ordered as in the example data
unique(setDT(df1), by = c('day', 'group', 'id'))
If not ordered,
unique(setDT(df1)[order(group,day, id, type)],by = c('day', 'group' , 'id'))
df1 <- structure(list(day = c(1L, 1L, 2L, 2L, 3L, 3L, 1L, 1L,
2L, 2L,
3L, 3L, 1L, 1L, 2L, 2L, 3L, 3L), value = c(0.1, 0.4, 0.2, 0.5,
0.3, 0.2, 0.1, 0.3, 0.1, 0.4, 0.2, 0.3, 0.1, 0.3, 0.3, 0.2, 0.2,
0.2), group = c("A", "A", "A", "A", "A", "A", "B", "B", "B",
"B", "B", "B", "C", "C", "C", "C", "C", "C"), type = c("X", "Y",
"X", "Y", "X", "Y", "X", "Y", "X", "Y", "X", "Y", "X", "Y", "X",
"Y", "X", "Y"), id = c(1L, 1L, 3L, 3L, 5L, 6L, 3L, 3L, 11L, 10L,
12L, 12L, 12L, 12L, 5L, 5L, 3L, 2L)), .Names = c("day", "value",
"group", "type", "id"), class = "data.frame",
row.names = c(NA, -18L))
Similar to P Lapointe, I had the following. I initially wanted to use all()
to check if all id values are identical or not, but that attempt failed. So, I chose to use diff()
. Using mutate()
I checked if all id values are identical or not for each group. Then, I chose rows which does not have the combination of check == TRUE and type == "Y"
. Finally, I removed the check column.
group_by(df1, day, group) %>%
mutate(check = any(diff(id) == 0)) %>%
filter(!(check == TRUE & type == "Y")) %>%
select(-check)
# day value group type id
# (int) (dbl) (fctr) (chr) (dbl)
#1 1 0.1 A X 1
#2 2 0.2 A X 3
#3 3 0.3 A X 5
#4 3 0.2 A Y 6
#5 1 0.1 B X 3
#6 2 0.1 B X 11
#7 2 0.4 B Y 10
#8 3 0.2 B X 12
#9 1 0.1 C X 12
#10 2 0.3 C X 5
#11 3 0.2 C X 3
#12 3 0.2 C Y 2
EDIT
After communicating with akrun, I revised the code above. Here it is.
group_by(df1, day, group) %>%
mutate(check = n_distinct(id) == 1) %>%
filter(!(check == TRUE & type == "Y")) %>%
select(-check)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With