Subset multiple columns in R - more elegant code?

Question

I am subsetting a dataframe according to multiple criteria across several columns. I am choosing the rows in the dataframe that contain any one of several values defined in the vector "criteria" in any one of three different columns.

I have some code that works, but wonder what other (more elegant?) ways there are to do this. Here is what I've done:

criteria <-c(1:10)
subset1 <-subset(data, data[, "Col1"] %in% criteria | data[, "Col2"]
 %in% criteria | data[, "Col3"] %in% criteria)

Suggestions warmly welcomed. (I am an R beginner, so very simple explanations about what you are suggesting are also warmly welcomed.)

nograpes · Accepted Answer

I'm not sure if you need two apply calls here:

# Data
df=data.frame(x=1:4,Col1=c(11,12,3,13),Col2=c(9,12,10,13),Col3=c(9,13,42,23))
criteria=1:10

# Solution
df[apply(df [c('Col1','Col2','Col3')],1,function(x) any(x %in% criteria)),]

Unless you want to do a lot of columns, then it is probably more readable to say:

subset(df, Col1 %in% criteria | Col2 %in% criteria | Col3 %in% criteria)

Brian Diggs · Answer

I'm using DF rather than data as the example.

DF[apply(apply(as.matrix(DF[c("Col1","Col2","Col3")]), 
               c(1,2), `%in%`, criteria), 
         1, any),]

For a breakdown of what this is doing:

Make a matrix of the specified columns, and for each element in that matrix test if it contains one of the criteria. Then for each row of that matrix, see if any of the row elements are TRUE. If so, keep the corresponding row of the original dataset.

Working through an example:

Start with dummy data:

DF <- data.frame(Col1=seq(1, by=2, length=10),
                 Col2=seq(3, by=3, length=10),
                 Col3=seq(7, by=1, length=10),
                 other=LETTERS[1:10])

which looks like

> DF
   Col1 Col2 Col3 other
1     1    3    7     A
2     3    6    8     B
3     5    9    9     C
4     7   12   10     D
5     9   15   11     E
6    11   18   12     F
7    13   21   13     G
8    15   24   14     H
9    17   27   15     I
10   19   30   16     J

Pull out just the columns of interest.

> as.matrix(DF[c("Col1","Col2","Col3")])
      Col1 Col2 Col3
 [1,]    1    3    7
 [2,]    3    6    8
 [3,]    5    9    9
 [4,]    7   12   10
 [5,]    9   15   11
 [6,]   11   18   12
 [7,]   13   21   13
 [8,]   15   24   14
 [9,]   17   27   15
[10,]   19   30   16

Check each entry versus the criteria

> apply(as.matrix(DF[c("Col1","Col2","Col3")]), c(1,2), `%in%`, criteria)
       Col1  Col2  Col3
 [1,]  TRUE  TRUE  TRUE
 [2,]  TRUE  TRUE  TRUE
 [3,]  TRUE  TRUE  TRUE
 [4,]  TRUE FALSE  TRUE
 [5,]  TRUE FALSE FALSE
 [6,] FALSE FALSE FALSE
 [7,] FALSE FALSE FALSE
 [8,] FALSE FALSE FALSE
 [9,] FALSE FALSE FALSE
[10,] FALSE FALSE FALSE

Test if any of the values in a row are TRUE

> apply(apply(as.matrix(DF[c("Col1","Col2","Col3")]), c(1,2), `%in%`, criteria), 1, any)
 [1]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE

Use that to index the original data frame.

> DF[apply(apply(as.matrix(DF[c("Col1","Col2","Col3")]), c(1,2), `%in%`, criteria), 1, any),]
  Col1 Col2 Col3 other
1    1    3    7     A
2    3    6    8     B
3    5    9    9     C
4    7   12   10     D
5    9   15   11     E

Subset multiple columns in R - more elegant code?

Tags:

r

subset

user1257313

2 Answers

nograpes

Brian Diggs

Recent Activity

Donate For Us

Subset multiple columns in R - more elegant code?

Tags:

r

subset

user1257313

2 Answers

nograpes

Brian Diggs

Related questions

Recent Activity

Donate For Us