Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

r subsetting based on a character variable for unique ids

Tags:

r

subset

I am trying to subset a data based on a "category" variable. More specifically, the category has two levels: a and b. Sample data looks like below:

    id <- c(1,2,2,2,1,1,2,2,1,2)
    category <- c("a", "b", "a", "a", "b", "a","a", "b","a","a")

    data <- data.frame("id"=id, "category"=category)
    > data
   id category
1   1        a
2   2        b
3   2        a
4   2        a
5   1        b
6   1        a
7   2        a
8   2        b
9   1        a
10  2        a

I would like to obtain the id s only having more than 3 counts of a or b in category variable. I looking a table count first, A table might look like this: (this part is not necessarily printed)

      a   b
1     3   1
2     4   2

then select that id s who matches with my criterion.

      a   b

2     4   2

Thanks in advance!

like image 260
amisos55 Avatar asked Nov 23 '25 13:11

amisos55


2 Answers

One dplyr possibility could be:

data %>%
 count(id, category) %>%
 group_by(id) %>%
 filter(n_distinct(category) == 2 & any(n > 3))

     id category     n
  <dbl> <fct>    <int>
1     2 a            4
2     2 b            2

If you want the exact output, with dplyr and tidyr, you can do:

data %>%
 count(id, category) %>%
 group_by(id) %>%
 filter(n_distinct(category) == 2 & any(n > 3)) %>%
 spread(category, n)

     id     a     b
  <dbl> <int> <int>
1     2     4     2
like image 78
tmfmnk Avatar answered Nov 26 '25 03:11

tmfmnk


If you just wanted the id's that matched your criteria you could use table and rowSums:

names(which(rowSums(table(data) > 3) != 0))
[1] "2"
like image 32
Andrew Avatar answered Nov 26 '25 03:11

Andrew



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!