I have a dataframe in a long format and I want to filter pairs based on unique combinations of values. I have a dataset that looks like this:
id <- rep(1:4, each=2)
type <- c("blue", "blue", "red", "yellow", "blue", "red", "red", "yellow")
df <- data.frame(id,type)
df
id type
1 1 blue
2 1 blue
3 2 red
4 2 yellow
5 3 blue
6 3 red
7 4 red
8 4 yellow
Let's say each id is a respondent and type is a combination of treatments. Individual 1 saw two objects, both of them blue; individual 2 saw one red object and a yellow one; and so on.
How do I keep, for example, those that saw the combination "red" and "yellow"? If I filter by the combination "red" and "yellow" the resulting dataset should look like this:
id type
3 2 red
4 2 yellow
7 4 red
8 4 yellow
It should keep respondents number 2 and number 4 (only those that saw the combination "red" and "yellow"). Note that it does not keep respondent number 3 because she saw "blue" and "red" (instead of "red" and "yellow"). How do I do this?
One solution is to reshape the dataset into a wide format, filter it by column, and restack again. But I am sure there is another way to do it without reshaping the dataset. Any idea?
A dplyr
solution would be:
library(dplyr)
df <- data_frame(
id = rep(1:4, each = 2),
type = c("blue", "blue", "red", "yellow", "blue", "red", "red", "yellow")
)
types <- c("red", "yellow")
df %>%
group_by(id) %>%
filter(all(types %in% type))
#> # A tibble: 4 x 2
#> # Groups: id [2]
#> id type
#> <int> <chr>
#> 1 2 red
#> 2 2 yellow
#> 3 4 red
#> 4 4 yellow
Allowing for the equal combinations, e.g. blue
, blue
, we have to change the filter-call to the following:
types2 <- c("blue", "blue")
df %>%
group_by(id) %>%
filter(sum(types2 == type) == length(types2))
#> # A tibble: 2 x 2
#> # Groups: id [1]
#> id type
#> <int> <chr>
#> 1 1 blue
#> 2 1 blue
This solution also allows different types
df %>%
group_by(id) %>%
filter(sum(types == type) == length(types))
#> # A tibble: 4 x 2
#> # Groups: id [2]
#> id type
#> <int> <chr>
#> 1 2 red
#> 2 2 yellow
#> 3 4 red
#> 4 4 yellow
Let's use all()
to see if all rows within group match a set of values.
library(tidyverse)
test_filter <- c("red", "yellow")
df %>%
group_by(id) %>%
filter(all(test_filter %in% type))
# A tibble: 4 x 2
# Groups: id [2]
id type
<int> <fctr>
1 2 red
2 2 yellow
3 4 red
4 4 yellow
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With