Conventionally in R one can use metacharacters in a regex with two slashes, e.g. ( becomes \(, but I find the same isn't true for square brackets.
mystring <- "abc[de"
#remove [,] and $ characters
gsub("[\\[\\]$]","",mystring)
[1] "abc[de"
[[:punct:]]
works but I hate to use a non-standard regex if I don't have to. Can the regex set syntax be used?
You should enable perl = TRUE
, then you can use Perl-like syntax which is more straight-forward (IMHO):
gsub("[\\[\\]$]","",mystring, perl = TRUE)
Or, you may use "smart placement" when placing ]
at the start of the bracket expression ([
is not special inside it, there is no need escaping [
there):
gsub("[][$]","",mystring)
See demo
Result:
[1] "abcde"
More details
The [...]
construct is considered a bracket expression by the TRE regex engine (used by default in base R regex functions - (g)sub, grep(l), (g)regexpr - when used without perl=TRUE
), which is a POSIX regex construct. Bracket expressions, unlike character classes in NFA regex engines, do not support escape sequences, i.e. the \
char is treated as a a literal backslash char inside them.
Thus, the [\[\]]
in a TRE regex matches \
or [
char (with the [\[\]
part that is actually equal to [\[]
) and then a ]
. So, it matches \]
or []
substrings, just have a look at gsub("[\\[\\]]", "", "[]\\]ab]")
demo - it outputs ab]
because []
and \]
are matched and eventually removed.
Note that the terms POSIX bracket expressions and NFA character classes are used in the same meaning as is used at https://www.regular-expressions.info, it is not quite a standard, but there is a need to differentiate between the two.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With