Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: How to replace space (' ') in string with a *single* backslash and space ('\ ')

Tags:

regex

replace

r

I've searched many times, and haven't found the answer here or elsewhere. I want to replace each space ' ' in variables containing file names with a '\ '. (A use case could be for shell commands, with the spaces escaped, so each file name doesn't appear as a list of arguments.) I have looked through the StackOverflow question "how to replace single backslash in R", and find that many combinations do work as advertised:

> gsub(" ", "\\\\", "a b")
[1] "a\\b"

> gsub(" ", "\\ ", "a b", fixed = TRUE)
[1] "a\\ b"

but try these with a single-slash version, and R ignores it:

> gsub(" ", "\\ ", "a b")
[1] "a b"

> gsub(" ", "\ ", "a b", fixed = TRUE)
[1] "a b"

For the case going in the opposite direction — removing slashes from a string, it works for two:

> gsub("\\\\", " ", "a\\b")
[1] "a b"

> gsub("\\", " ", "a\\b", fixed = TRUE)
[1] "a b"

However, for single slashes some inner perversity in R prevents me from even attempting to remove them:

> gsub("\\", " ", "a\\b")
Error in gsub("\\", " ", "a\\b") : 
  invalid regular expression '\', reason 'Trailing backslash'

> gsub("\", " ", "a\b", fixed = TRUE)
Error: unexpected string constant in "gsub("\", " ", ""

The 'invalid regular expression' is telling us something, but I don't see what. (Note too that the perl = True option does not help.)

Even with three back slashes R fails to notice even one:

> gsub(" ", "\\\ ", "a b")
[1] "a b"

The patter extends too! Even multiples of two work:

> gsub(" ", "\\\\\\\\", "a b")
[1] "a\\\\b"

but not odd multiples (should get '\\\ ':

> gsub(" ", "\\\\\\ ", "a b")
[1] "a\\ b"

> gsub(" ", "\\\ ", "a b", fixed = TRUE)
[1] "a\\ b"

(I would expect 3 slashes, not two.)

My two questions are:

  • How can my goal of replacing a ' ' with a '\ ' be accomplished?
  • Why did the odd number-slash variants of the replacements fail, while the even number-slash replacements worked?

For shell commands a simple work-around is to quote the file names, but part of my interest is just wanting to understand what is going on with R's regex engine.

like image 333
user3897315 Avatar asked Aug 12 '16 02:08

user3897315


2 Answers

Get ready for a face-palm, because this:

> gsub(" ", "\\\ ", "a b", fixed = TRUE)
[1] "a\\ b"

is actually working.

The two backslashes you see are just the R console's way of displaying a single backslash, which is escaped when printed to the screen.

To confirm the replacement with a single backslash is indeed working, try writing the output to a text file and inspect yourself:

f <- file("C:\\output.txt")
writeLines(gsub(" ", "\\", "a b", fixed = TRUE), f)
close(f)

In output.txt you should see the following:

a\b
like image 154
Tim Biegeleisen Avatar answered Nov 03 '22 00:11

Tim Biegeleisen


Very helpful discussion! (I've been Googling the heck out of this for 2 days.)

Another way to see the difference (rather than writing to a file) is to compare the contents of the string using print and cat.

z <- gsub(" ", "\\", "a b", fixed = TRUE)

> print(z)
[1] "a\\ b"

> cat(z)
a\ b

So, by using cat instead of print we can confirm that the gsub line is doing what was intended when we're trying to add single backslashes to a string.

like image 3
D. Woods Avatar answered Nov 03 '22 00:11

D. Woods