Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select groups based on number of unique / distinct values

I have a data frame like below

sample <- data.frame(ID = 1:9,
                     Group = c('AA','AA','AA','BB','BB','CC','CC','BB','CC'),
                     Value = c(1,1,1,2,2,2,3,2,3))

ID       Group    Value
1        AA       1
2        AA       1
3        AA       1
4        BB       2
5        BB       2
6        CC       2
7        CC       3
8        BB       2
9        CC       3

I want to select groups according to the number of distinct (unique) values within each group. For example, select groups where all values within the group are the same (one distinct value per group). If you look at the group CC, it has more than one distinct value (2 and 3) and should thus be removed. The other groups, with only one distinct value, should be kept. Desired output:

ID       Group    Value
1        AA       1
2        AA       1
3        AA       1
4        BB       2
5        BB       2
8        BB       2

Would you tell me simple and fast code in R that solves the problem?

like image 355
Keith Park Avatar asked Jan 29 '14 02:01

Keith Park


2 Answers

Here's a solution using dplyr:

library(dplyr)

sample <- data.frame(
  ID = 1:9,  
  Group= c('AA', 'AA', 'AA', 'BB', 'BB', 'CC', 'CC', 'BB', 'CC'),  
  Value = c(1, 1, 1, 2, 2, 2, 3, 2, 3)
)

sample %>%
  group_by(Group) %>%
  filter(n_distinct(Value) == 1)

We group the data by Group, and then only select groups where the number of distinct values of Value is 1.

like image 77
hadley Avatar answered Nov 15 '22 20:11

hadley


You can make a selector for sample using ave many different ways.

sample[ ave( sample$Value, sample$Group, FUN = function(x) length(unique(x)) ) == 1,]

or

sample[ ave( sample$Value, sample$Group, FUN = function(x) sum(x - x[1]) ) == 0,]

or

sample[ ave( sample$Value, sample$Group, FUN = function(x) diff(range(x)) ) == 0,]
like image 22
John Avatar answered Nov 15 '22 19:11

John