I am trying to subset a data based on a "category" variable. More specifically, the category has two levels: a and b. Sample data looks like below:
id <- c(1,2,2,2,1,1,2,2,1,2)
category <- c("a", "b", "a", "a", "b", "a","a", "b","a","a")
data <- data.frame("id"=id, "category"=category)
> data
id category
1 1 a
2 2 b
3 2 a
4 2 a
5 1 b
6 1 a
7 2 a
8 2 b
9 1 a
10 2 a
I would like to obtain the id s only having more than 3 counts of a or b in category variable. I looking a table count first, A table might look like this: (this part is not necessarily printed)
a b
1 3 1
2 4 2
then select that id s who matches with my criterion.
a b
2 4 2
Thanks in advance!
One dplyr possibility could be:
data %>%
count(id, category) %>%
group_by(id) %>%
filter(n_distinct(category) == 2 & any(n > 3))
id category n
<dbl> <fct> <int>
1 2 a 4
2 2 b 2
If you want the exact output, with dplyr and tidyr, you can do:
data %>%
count(id, category) %>%
group_by(id) %>%
filter(n_distinct(category) == 2 & any(n > 3)) %>%
spread(category, n)
id a b
<dbl> <int> <int>
1 2 4 2
If you just wanted the id's that matched your criteria you could use table and rowSums:
names(which(rowSums(table(data) > 3) != 0))
[1] "2"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With