R String Interpretation: why does "\040" get interpreted as " " and what other potential pitfalls could I come across in string interpretation?

Question

I was helping someone today regex some info out of a pdf file that we read in as a txt file. Unfortunately the tm packages readPDF function was not working correctly at the time, though through a few modifications we were able to get it to work just fine. While we were regexing out some of the fluff from the .txt file we found something that was surprising to most of us, namely that the string "\040" gets interpreted as a space, " ".

> x <- "\040"    
> x    
> [1] " "

This doesn't happen for other, similar character strings (i.e. " " or " ") that you may expect this to happen for.

> y <- "
"   
> y    
> [1] "
"    
> z <- "	"    
> z    
>[1] "	"

Why is this? What other character strings are interpreted differently in R?

EDIT:

It seems after simple experimentation, any "\xxx" where x are digits yields a different result. What is the value of this?

Thomas · Accepted Answer

Take a look here: http://stat.ethz.ch/R-manual/R-devel/library/base/html/Quotes.html

Backslash is used to start an escape sequence inside character constants. Escaping a character not in the following table is an error.

...

nn character with given octal code (1, 2 or 3 digits)

Then take a look at this ASCII table to see how octal codes get represented. As you will see 040 is a space.

And just for fun:

> '\110\145\154\154\157\040\127\157\162\154\144\041'
[1] "Hello World!"

R String Interpretation: why does "\040" get interpreted as " " and what other potential pitfalls could I come across in string interpretation?

Tags:

string

r

string-interpolation

stanekam

1 Answers

Thomas

Recent Activity

Donate For Us

R String Interpretation: why does "\040" get interpreted as " " and what other potential pitfalls could I come across in string interpretation?

Tags:

string

r

string-interpolation

stanekam

1 Answers

Thomas

Related questions

Recent Activity

Donate For Us