Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to escape closed bracket "]" in regex in R

Tags:

regex

r

gsub

I'm trying to use gsub in R to replace a bunch of weird characters in some strings I'm processing. Everything works, except whenever I throw in "]" it makes the whole thing do nothing. I'm using \\ like gsub("[\\?\\*\\]]", "", name) but it's still not working. Here's my actual example:

name <- "R U Still Down? [Remember Me]"

what I want is: names to be "R U Still Down Remember Me"

when I do: names <- gsub("[\\(\\)\\*\\$\\+\\?'\\[]", "", name) it semi-works and I get "R U Still Down Remember Me]"

but when I do: names <- gsub("[\\(\\)\\*\\$\\+\\?'\\[\\]]", "", name) nothing happens. (i.e. I get "R U Still Down? [Remember Me]")

Any ideas? I've tried switching around the order of things, etc. But I can't seem to figure it out.

like image 217
seth127 Avatar asked Aug 17 '15 00:08

seth127


People also ask

What do the square brackets mean in a regex?

Since the square brackets are used to define a character class in a regex, we cannot directly mention that when we want to match it literally. See the below example.

What is character escaping in regex?

Character escaping is what allows certain characters (reserved by the regex engine for manipulating searches) to be literally searched for and found in the input string. Escaping depends on context, therefore this example does not cover string or delimiter escaping. Saying that backslash is the "escape" character is a bit misleading.

How to add special characters to brackets in regexp?

All you need is this: (if (looking-at " [ []") (insert "f")). In general, "special" regexp characters are not special within brackets. See the Elisp manual, node Regexp Special.

Why do I need to pair backslashes with regex in R?

Since both R and regex share the escape character , "", building correct patterns for grep, sub, gsub or any other function that accepts a pattern argument will often need pairing of backslashes.


2 Answers

Just enable perl=TRUE parameter.

> gsub("[?\\]\\[*]", "", name, perl=T)
[1] "R U Still Down Remember Me"

And escape only the needed characters.

> gsub("[()*$+?'\\[\\]]", "", name, perl=T)
[1] "R U Still Down Remember Me"
like image 52
Avinash Raj Avatar answered Sep 29 '22 11:09

Avinash Raj


You can switch the order of the character class without escaping.

name <- 'R U Still Down? [Remember Me][*[[]*'
gsub('[][?*]', '', name)
# [1] "R U Still Down Remember Me"

If you want to remove all punctuation characters, use the POSIX class [:punct:]

gsub('[[:punct:]]', '', name)

This class in the ASCII range matches all non-controls, non-alphanumeric, non-space characters.

ascii <- rawToChar(as.raw(0:127), multiple=T)
paste(ascii[grepl('[[:punct:]]', ascii)], collapse="")
# [1] "!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~"
like image 31
hwnd Avatar answered Sep 29 '22 10:09

hwnd