Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Subset a dataframe by multiple factor levels [duplicate]

Tags:

r

subset

How can I avoid using a loop to subset a dataframe based on multiple factor levels?

In the following example my desired output is a dataframe. The dataframe should contain the rows of the original dataframe where the value in "Code" equals one of the values in "selected".

Working example:

#sample data
Code<-c("A","B","C","D","C","D","A","A")
Value<-c(1, 2, 3, 4, 1, 2, 3, 4)
data<-data.frame(cbind(Code, Value))

selected<-c("A","B") #want rows that contain A and B

#Begin subsetting
result<-data[which(data$Code==selected[1]),]
s1<-2
while(s1<length(selected)+1)
{
  result<-rbind(result,data[which(data$Code==selected[s1]),])
  s1<-s1+1
}

This is a toy example of a much larger dataset, so "selected" may contain a great number of elements and the data a great number of rows. Therefore I would like to avoid the loop.

like image 562
Walter Avatar asked Oct 20 '13 22:10

Walter


3 Answers

You can use %in%

  data[data$Code %in% selected,]
  Code Value
1    A     1
2    B     2
7    A     3
8    A     4
like image 171
Metrics Avatar answered Nov 11 '22 23:11

Metrics


Here's another:

data[data$Code == "A" | data$Code == "B", ]

It's also worth mentioning that the subsetting factor doesn't have to be part of the data frame if it matches the data frame rows in length and order. In this case we made our data frame from this factor anyway. So,

data[Code == "A" | Code == "B", ]

also works, which is one of the really useful things about R.

like image 22
Joe Avatar answered Nov 12 '22 00:11

Joe


Try this:

> data[match(as.character(data$Code), selected, nomatch = FALSE), ]
    Code Value
1      A     1
2      B     2
1.1    A     1
1.2    A     1
like image 4
Jilber Urbina Avatar answered Nov 11 '22 23:11

Jilber Urbina