Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"invalid regular expression...reason 'Trailing backslash''' error with gsub in R

Tags:

r

gsub

I am getting error message while replacing text in R.

 x
 [1] "Easy bruising and bleeding.\\"

gsub(as.character(x), "\\", "")
Error in gsub(as.character(x), "\\", "") : 
   invalid regular expression 'Easy bruising and bleeding.\', reason 'Trailing backslash'
like image 285
Manish Avatar asked Mar 31 '14 07:03

Manish


2 Answers

The arguments are in the wrong order. Study help("gsub").

gsub( "\\", "", "Easy bruising and bleeding.\\", fixed=TRUE)
#[1] "Easy bruising and bleeding."
like image 104
Roland Avatar answered Nov 15 '22 00:11

Roland


tl;dr: You need 4 \s (i.e. \\\\) in the first argument of gsub in order to find one literal \ in the third argument of gsub. The overall process is:

  • gsub receives \\\\, passes \\
  • regex receives \\, searches \.

To avoid fixed = TRUE, which precludes doing more complex searches, your code should be:

> gsub( "\\\\", "", "Easy bruising and bleeding.\\")
[1] "Easy bruising and bleeding."

Explanation: The reason you need 4 \ is that \ is a special character for the regex engine, so in order for the regex engine to find a literal \ it needs to be passed \\; the first \ indicates that the second \ is not a special character but a \ that should be matched literally. Thus regex receives \\ and searches for \ in the string.

\ is also a special character for R, so in order for gsub to pass \\ to the regex engine, gsub needs to be receive \\\\. The first \ indicates that the second \ is a literal \ and not a special character; the third \ does the same thing for the fourth \. Thus gsub receives \\\\ and passes \\ to the regex engine.

Again, the overall process is: gsub receives \\\\, passes \\; regex receives \\, searches \.

Note: while the string that you gave us prints to the screen as "Easy bruising and bleeding.\\", the string is actually Easy bruising and bleeding.\. The first \ is actually just an escape for the second \. You can verify this with this code:

> cat("Easy bruising and bleeding.\\")
Easy bruising and bleeding.\

That's why the code I suggest has 4 \s and not 8 \s.

like image 27
Josh Avatar answered Nov 15 '22 00:11

Josh