I'm trying to use gsub
in R to replace a bunch of weird characters in some strings I'm processing. Everything works, except whenever I throw in "]" it makes the whole thing do nothing. I'm using \\
like gsub("[\\?\\*\\]]", "", name)
but it's still not working. Here's my actual example:
name <- "R U Still Down? [Remember Me]"
what I want is: names
to be "R U Still Down Remember Me"
when I do:
names <- gsub("[\\(\\)\\*\\$\\+\\?'\\[]", "", name)
it semi-works and I get "R U Still Down Remember Me]"
but when I do:
names <- gsub("[\\(\\)\\*\\$\\+\\?'\\[\\]]", "", name)
nothing happens. (i.e. I get "R U Still Down? [Remember Me]"
)
Any ideas? I've tried switching around the order of things, etc. But I can't seem to figure it out.
Since the square brackets are used to define a character class in a regex, we cannot directly mention that when we want to match it literally. See the below example.
Character escaping is what allows certain characters (reserved by the regex engine for manipulating searches) to be literally searched for and found in the input string. Escaping depends on context, therefore this example does not cover string or delimiter escaping. Saying that backslash is the "escape" character is a bit misleading.
All you need is this: (if (looking-at " [ []") (insert "f")). In general, "special" regexp characters are not special within brackets. See the Elisp manual, node Regexp Special.
Since both R and regex share the escape character , "", building correct patterns for grep, sub, gsub or any other function that accepts a pattern argument will often need pairing of backslashes.
Just enable perl=TRUE
parameter.
> gsub("[?\\]\\[*]", "", name, perl=T)
[1] "R U Still Down Remember Me"
And escape only the needed characters.
> gsub("[()*$+?'\\[\\]]", "", name, perl=T)
[1] "R U Still Down Remember Me"
You can switch the order of the character class without escaping.
name <- 'R U Still Down? [Remember Me][*[[]*'
gsub('[][?*]', '', name)
# [1] "R U Still Down Remember Me"
If you want to remove all punctuation characters, use the POSIX class [:punct:]
gsub('[[:punct:]]', '', name)
This class in the ASCII range matches all non-controls, non-alphanumeric, non-space characters.
ascii <- rawToChar(as.raw(0:127), multiple=T)
paste(ascii[grepl('[[:punct:]]', ascii)], collapse="")
# [1] "!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With