Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

String matching on two columns in [R]

I am looking to match multiple string criteria and then subset the row in R, using grepl to find the match. I have found a nice solution from another post where some specific code is used (but you get the idea): subset(GEMA_EO5, grepl(paste(l, collapse="|"),GEMA_EO5$RefSeq_ID))

I am wondering if it is possible to grepl in two columns, instead of just RefSeq_ID in the example above. That is, in grepl via any other method. In other words, I would like to look for the options in l not just in one column, but in two (or however many). Is this possible?

eg.: 3 columns, a b and c. I would like to criteria such that T (rows 3 and 4) is selected, despite the format "T I" in (3,b). it should identify both (4,a) and (3,b), hence the link to the previous question. I want it to look in column a AND column b, not one or the other.

    a    b     c

    A    A C   P L
    V    V B   W E E
    W    T I   P J G
    T    W P   J
like image 989
kirk Avatar asked Jun 03 '13 13:06

kirk


1 Answers

Here's some demo data to show how this works:

set.seed(1234)
dat <- data.frame(A = sample(letters[1:3],10,TRUE),
                  B = sample(letters[1:3],10,TRUE))

Using [ to subset makes this a lot more clear in my opinion - we can use grepl to give a logical vector based on a match, and use | to combine two tests (on multiple columns). If you wanted a subset of all the rows that contained an 'a' in either column:

dat.a <- dat[with(dat, grepl("a", A)|grepl("a", B)),]
  A B
1 b a
2 b a
3 a c
5 a a
9 a a
like image 67
alexwhan Avatar answered Sep 28 '22 20:09

alexwhan