This is related to a previous question, here: Converting a \u escaped Unicode string to ASCII
I proposed a solution involving eval(parse(text=x))
, which for non-R users, means what it says: parsing the text string, then evaluating it. The aim was not to allow arbitrary code to be executed, but only to un-escape escaped Unicode text. Hence the solution:
eval(parse(text=paste0("'", x, "'")))
While this should be fairly safe given the restricted objective, I'd be interested to know: how much sanitisation is required to keep things safe?
At a minimum, I guess any embedded single and double quotes have to be escaped. For example, suppose we have
x <- "this is a '; print(dir()); 'string"
Then eval
'ing this per the snippet above would execute the code in the middle. So we have to escape the quotes:
eval(parse(text=paste0("'",
gsub("'", "\\\\'", x),
"'")))
And similarly for double quotes. I don't think the unescaped Unicode equivalents \u0022
and \u0027
are a problem, since to the parser they'll be identical to plain "
and '
.
Are there any holes in this approach that I've missed?
this is a \'; print(dir()); 'string
is escaped to:
'this is a \\'; print(dir()); 'string'
double-backslash is evaled as literal backslash, quote is active, code is executed.
Also I don't know about R but probably you could at minimum cause a crash using raw control characters like newline or invalid escapes.
eval
is a mug's game in general. Normal string handling (search string for the sequence you want, replacing it) is the better approach, and using an existing library for a particular properly-specified format is best of all. For example if you have JSON, use a JSON parser. There are many possible string literal formats that use \u
escapes, all with slightly different rules, so you will want to choose the exact format correctly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With