Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sanitising strings in R

This is related to a previous question, here: Converting a \u escaped Unicode string to ASCII

I proposed a solution involving eval(parse(text=x)), which for non-R users, means what it says: parsing the text string, then evaluating it. The aim was not to allow arbitrary code to be executed, but only to un-escape escaped Unicode text. Hence the solution:

eval(parse(text=paste0("'", x, "'")))

While this should be fairly safe given the restricted objective, I'd be interested to know: how much sanitisation is required to keep things safe?

At a minimum, I guess any embedded single and double quotes have to be escaped. For example, suppose we have

x <- "this is a '; print(dir()); 'string"

Then eval'ing this per the snippet above would execute the code in the middle. So we have to escape the quotes:

eval(parse(text=paste0("'",
                       gsub("'", "\\\\'", x),
                       "'")))

And similarly for double quotes. I don't think the unescaped Unicode equivalents \u0022 and \u0027 are a problem, since to the parser they'll be identical to plain " and '.

Are there any holes in this approach that I've missed?

like image 784
Hong Ooi Avatar asked Jul 21 '13 07:07

Hong Ooi


1 Answers

this is a \'; print(dir()); 'string

is escaped to:

'this is a \\'; print(dir()); 'string'

double-backslash is evaled as literal backslash, quote is active, code is executed.

Also I don't know about R but probably you could at minimum cause a crash using raw control characters like newline or invalid escapes.

eval is a mug's game in general. Normal string handling (search string for the sequence you want, replacing it) is the better approach, and using an existing library for a particular properly-specified format is best of all. For example if you have JSON, use a JSON parser. There are many possible string literal formats that use \u escapes, all with slightly different rules, so you will want to choose the exact format correctly.

like image 155
bobince Avatar answered Oct 22 '22 11:10

bobince