Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace single backslash in R

I have a string that looks like:

str<-"a\f\r" 

I'm trying to remove the backslashes but nothing works:

gsub("\","",str, fixed=TRUE) gsub("\\","",str) gsub("(\)","",str) gsub("([\])","",str) 

...basically all the variations you can imagine. I have even tried the string_replace_all function. ANY HELP??

I'm using R version 3.1.1; Mac OSX 10.7; the dput for a single string in my vector of strings gives:

dput(line) "ud83d\ude21\ud83d\udd2b" 

I imported the file using readLines from a standard .txt file. The content of the file looks something like: got an engineer booked for this afternoon \ud83d\udc4d all now hopefully sorted\ud83d\ude0a I m going to go insane ud83d\ude21\ud83d\udd2b in utf8towcs …

Thanks.

like image 388
Tavi Avatar asked Aug 21 '14 10:08

Tavi


People also ask

How do you backslash in regex?

To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).

What does double backslash mean in R?

The "\" is an escape character saying "within this string, the following character should be taken as is." Thus, if you want to actually look for "\" you need to escape it. However, when doing regular expressions in R, you have to "double escape" which is why you see "\\".

What does backslash mean in R?

In R (and elsewhere), the backslash is the “escape” symbol, which is followed by another symbol to indicate a special character. For example, "\t" represents a “tab” and "\n" is the symbol for a new line (hard return).


2 Answers

One quite universal solution is

gsub("\\\\", "", str) 

Thanks to the comment above.

like image 135
JelenaČuklina Avatar answered Oct 08 '22 01:10

JelenaČuklina


When inputting backslashes from the keyboard, always escape them.

str <-"this\\is\\my\\string"    # note doubled backslashes -> 'this\is\my\string' gsub("\\", "", str, fixed=TRUE) # ditto  str2 <- "a\\f\\r"               # ditto -> 'a\f\r' gsub("\\", "", str2, fixed=TRUE)# ditto 

Note that if you do

str <- "a\f\r" 

then str contains no backslashes. It consists of the 3 characters a, \f (which is not normally printable, except as \f, and \r (same).

And just to head off a possible question. If your data was read from a file, the file doesn't have to have doubled backslashes. For example, if you have a file test.txt containing

a\b\c\d\e\f 

and you do

str <- readLines("test.txt") 

then str will contain the string a\b\c\d\e\f as you'd expect: 6 letters separated by 5 single backslashes. But you still have to type doubled backslashes if you want to work with it.

str <- gsub("\\", "", str, fixed=TRUE)  # now contains abcdef 

From the dput, it looks like what you've got there is UTF-16 encoded text, which probably came from a Windows machine. According to

  • https://en.wikipedia.org/wiki/Unicode#Character_General_Category
  • https://en.wikipedia.org/wiki/UTF-16

it encodes glyphs in the Supplementary Multilingual Plane, which is pretty obscure. I'll guess that you need to supply the argument encoding="UTF-16" to readLines when you read in the file.

like image 34
Hong Ooi Avatar answered Oct 08 '22 01:10

Hong Ooi