Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex for rectangle brackets in R

Tags:

regex

r

Conventionally in R one can use metacharacters in a regex with two slashes, e.g. ( becomes \(, but I find the same isn't true for square brackets.

mystring <- "abc[de"

#remove [,] and $ characters

gsub("[\\[\\]$]","",mystring)

[1] "abc[de"

[[:punct:]] works but I hate to use a non-standard regex if I don't have to. Can the regex set syntax be used?

like image 950
Patrick McCarthy Avatar asked May 01 '15 18:05

Patrick McCarthy


1 Answers

You should enable perl = TRUE, then you can use Perl-like syntax which is more straight-forward (IMHO):

gsub("[\\[\\]$]","",mystring, perl = TRUE)

Or, you may use "smart placement" when placing ] at the start of the bracket expression ([ is not special inside it, there is no need escaping [ there):

gsub("[][$]","",mystring)

See demo

Result:

[1] "abcde"

More details

The [...] construct is considered a bracket expression by the TRE regex engine (used by default in base R regex functions - (g)sub, grep(l), (g)regexpr - when used without perl=TRUE), which is a POSIX regex construct. Bracket expressions, unlike character classes in NFA regex engines, do not support escape sequences, i.e. the \ char is treated as a a literal backslash char inside them.

Thus, the [\[\]] in a TRE regex matches \ or [ char (with the [\[\] part that is actually equal to [\[]) and then a ]. So, it matches \] or [] substrings, just have a look at gsub("[\\[\\]]", "", "[]\\]ab]") demo - it outputs ab] because [] and \] are matched and eventually removed.

Note that the terms POSIX bracket expressions and NFA character classes are used in the same meaning as is used at https://www.regular-expressions.info, it is not quite a standard, but there is a need to differentiate between the two.

like image 98
Wiktor Stribiżew Avatar answered Sep 28 '22 08:09

Wiktor Stribiżew