Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using regexp to select rows in R dataframe

Tags:

I'm trying to select rows in a dataframe where the string contained in a column matches either a regular expression or a substring:

dataframe:

aName   bName   pName   call  alleles   logRatio    strength AX-11086564 F08_ADN103  2011-02-10_R10  AB  CG  0.363371    10.184215 AX-11086564 A01_CD1919  2011-02-24_R11  BB  GG  -1.352707   9.54909 AX-11086564 B05_CD2920  2011-01-27_R6   AB  CG  -0.183802   9.766334 AX-11086564 D04_CD5950  2011-02-09_R9   AB  CG  0.162586    10.165051 AX-11086564 D07_CD6025  2011-02-10_R10  AB  CG  -0.397097   9.940238 AX-11086564 B05_CD3630  2011-02-02_R7   AA  CC  2.349906    9.153076 AX-11086564 D04_ADN103  2011-02-10_R2   BB  GG  -1.898088   9.872966 AX-11086564 A01_CD2588  2011-01-27_R5   BB  GG  -1.208094   9.239801 

For example, I want a dataframe containing only rows that contain ADN in column bName. Secondarily, I would like all rows that contain ADN in column bName and that match 2011-02-10_R2 in column pName.

I tried using functions grep(), agrep() and more but without success...

like image 748
Eric C. Avatar asked Mar 01 '12 17:03

Eric C.


People also ask

How do I select certain rows in R?

By using bracket notation on R DataFrame (data.name) we can select rows by column value, by index, by name, by condition e.t.c. You can also use the R base function subset() to get the same results. Besides these, R also provides another function dplyr::filter() to get the rows from the DataFrame.

How do I use regex to match?

To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).

What does regex do in R?

Details. A 'regular expression' is a pattern that describes a set of strings. Two types of regular expressions are used in R, extended regular expressions (the default) and Perl-like regular expressions used by perl = TRUE . There is also fixed = TRUE which can be considered to use a literal regular expression.

What is regex pattern matching?

A regular expression is a pattern of text that consists of ordinary characters, for example, letters a through z, and special characters. Character(s) Matches in searched string.


1 Answers

subset(dat, grepl("ADN", bName)  &  pName == "2011-02-10_R2" ) 

Note "&" (and not "&&" which is not vectorized) and that "==" (and not"=" which is assignment).

Note that you could have used:

 dat[ with(dat,  grepl("ADN", bName)  &  pName == "2011-02-10_R2" ) , ] 

... and that might be preferable when used inside functions, however, that will return NA values for any lines where dat$pName is NA. That defect (which some regard as a feature) could be removed by the addition of & !is.na(dat$pName) to the logical expression.

like image 65
IRTFM Avatar answered Oct 10 '22 04:10

IRTFM