I have a string that looks like:
str<-"a\f\r"
I'm trying to remove the backslashes but nothing works:
gsub("\","",str, fixed=TRUE) gsub("\\","",str) gsub("(\)","",str) gsub("([\])","",str)
...basically all the variations you can imagine. I have even tried the string_replace_all
function. ANY HELP??
I'm using R version 3.1.1; Mac OSX 10.7; the dput
for a single string in my vector of strings gives:
dput(line) "ud83d\ude21\ud83d\udd2b"
I imported the file using readLines
from a standard .txt
file. The content of the file looks something like: got an engineer booked for this afternoon \ud83d\udc4d all now hopefully sorted\ud83d\ude0a I m going to go insane ud83d\ude21\ud83d\udd2b in utf8towcs …
Thanks.
To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).
The "\" is an escape character saying "within this string, the following character should be taken as is." Thus, if you want to actually look for "\" you need to escape it. However, when doing regular expressions in R, you have to "double escape" which is why you see "\\".
In R (and elsewhere), the backslash is the “escape” symbol, which is followed by another symbol to indicate a special character. For example, "\t" represents a “tab” and "\n" is the symbol for a new line (hard return).
One quite universal solution is
gsub("\\\\", "", str)
Thanks to the comment above.
When inputting backslashes from the keyboard, always escape them.
str <-"this\\is\\my\\string" # note doubled backslashes -> 'this\is\my\string' gsub("\\", "", str, fixed=TRUE) # ditto str2 <- "a\\f\\r" # ditto -> 'a\f\r' gsub("\\", "", str2, fixed=TRUE)# ditto
Note that if you do
str <- "a\f\r"
then str
contains no backslashes. It consists of the 3 characters a
, \f
(which is not normally printable, except as \f
, and \r
(same).
And just to head off a possible question. If your data was read from a file, the file doesn't have to have doubled backslashes. For example, if you have a file test.txt
containing
a\b\c\d\e\f
and you do
str <- readLines("test.txt")
then str
will contain the string a\b\c\d\e\f
as you'd expect: 6 letters separated by 5 single backslashes. But you still have to type doubled backslashes if you want to work with it.
str <- gsub("\\", "", str, fixed=TRUE) # now contains abcdef
From the dput
, it looks like what you've got there is UTF-16 encoded text, which probably came from a Windows machine. According to
it encodes glyphs in the Supplementary Multilingual Plane, which is pretty obscure. I'll guess that you need to supply the argument encoding="UTF-16"
to readLines
when you read in the file.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With