Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dynamic regex in R

Tags:

regex

r

The below code works so long as before and after strings have no characters that are special to a regex:

before <- 'Name of your Manager (note "self" if you are the Manager)' #parentheses cause problem in regex
after  <- 'CURRENT FOCUS'

pattern <- paste0(c('(?<=', before, ').*?(?=', after, ')'), collapse='')
ex <- regmatches(x, gregexpr(pattern, x, perl=TRUE))

Does R have a function to escape strings to be used in regexes?

like image 920
dnagirl Avatar asked Apr 25 '13 18:04

dnagirl


People also ask

Can you use regex in r?

Details. A 'regular expression' is a pattern that describes a set of strings. Two types of regular expressions are used in R, extended regular expressions (the default) and Perl-like regular expressions used by perl = TRUE . There is also fixed = TRUE which can be considered to use a literal regular expression.

What does \r mean regex?

The \r metacharacter matches carriage return characters.

What does Grepl return?

The function grepl() works much like grep() except that it differs in its return value. grepl() returns a logical vector indicating which element of a character vector contains the match. For example, suppose we want to know which states in the United States begin with word “New”.

Which function we use to match pattern in regular expression?

Regular expressions are used with the RegExp methods test() and exec() and with the String methods match() , replace() , search() , and split() . Executes a search for a match in a string. It returns an array of information or null on a mismatch.


2 Answers

In Perl, there is http://perldoc.perl.org/functions/quotemeta.html for doing exactly that. If the doc is correct when it says

Returns the value of EXPR with all the ASCII non-"word" characters backslashed. (That is, all ASCII characters not matching /[A-Za-z_0-9]/ will be preceded by a backslash in the returned string, regardless of any locale settings.)

then you can achieve the same by doing:

quotemeta <- function(x) gsub("([^A-Za-z_0-9])", "\\\\\\1", x)

And your pattern should be:

pattern <- paste0(c('(?<=', quotemeta(before), ').*?(?=', quotemeta(after), ')'),
                  collapse='')

Quick sanity check:

a <- "he'l(lo)"
grepl(a, a)
# [1] FALSE
grepl(quotemeta(a), a)
# [1] TRUE
like image 108
flodel Avatar answered Sep 28 '22 07:09

flodel


Use \Q...\E to surround the verbatim subpatterns:

# test data
before <- "A."
after <- ".Z"
x <- c("A.xyz.Z", "ABxyzYZ")

pattern <- sprintf('(?<=\\Q%s\\E).*?(?=\\Q%s\\E)', before, after)

which gives:

> gregexpr(pattern, x, perl = TRUE) > 0
[1]  TRUE FALSE
like image 42
G. Grothendieck Avatar answered Sep 28 '22 07:09

G. Grothendieck