Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Subset by multiple conditions

Tags:

r

subset

Maybe it's something basic, but I couldn't find the answer.
I have

Id Year V1  
 1 2009 33   
 1 2010 67  
 1 2011 38  
 2 2009 45  
 3 2009 65  
 3 2010 74  
 4 2009 47  
 4 2010 51  
 4 2011 14

I need to select only the rows that have the same Id but it´s in the three years 2009, 2010 and 2011.

Id Year V1  
 1 2009 33  
 1 2010 67  
 1 2011 38  
 4 2009 47  
 4 2010 51  
 4 2011 14   

I try

d1_3 <- subset(d1, Year==2009 |Year==2010 |Year==2011 )

but it doesn't work.

Can anyone provide some suggestions that how I can do this in R?

like image 680
Tappin73 Avatar asked Mar 18 '14 09:03

Tappin73


2 Answers

I think ave could be useful here. I call your original data frame 'df'. For each Id, check if 2009-2011 is present in Year (2009:2011 %in% x). This gives a logical vector, which can be summed. Test if the sum equals 3 (if all Years are present, the sum is 3), which results in a new logical vector, which is used to subset rows of the data frame.

df[ave(df$Year, df$Id, FUN = function(x) sum(2009:2011 %in% x) == 3, ]
#   Id Year V1
# 1  1 2009 33
# 2  1 2010 67
# 3  1 2011 38
# 7  4 2009 47
# 8  4 2010 51
# 9  4 2011 14
like image 126
Henrik Avatar answered Oct 24 '22 02:10

Henrik


Another way of using ave

DF
##   Id Year V1
## 1  1 2009 33
## 2  1 2010 67
## 3  1 2011 38
## 4  2 2009 45
## 5  3 2009 65
## 6  3 2010 74
## 7  4 2009 47
## 8  4 2010 51
## 9  4 2011 14


DF[ave(DF$Year, DF$Id, FUN = function(x) all(2009:2011 %in% x)) == 1, ]
##   Id Year V1
## 1  1 2009 33
## 2  1 2010 67
## 3  1 2011 38
## 7  4 2009 47
## 8  4 2010 51
## 9  4 2011 14
like image 31
CHP Avatar answered Oct 24 '22 03:10

CHP