Can I replace
source(filename, local = TRUE, encoding = 'UTF-8')
with
eval(parse(filename, encoding = 'UTF-8'))
without any risk of breakage, to make UTF-8 source files work on Windows?
I am currently loading specific source files via
source(filename, local = TRUE, encoding = 'UTF-8')
However, it is well known that this does not work on Windows, full stop.
As a workaround, Joe Cheng suggested using instead
eval(parse(filename, encoding = 'UTF-8'))
This seems to work quite well1 but even after consulting the source code of source
, I don’t understand how they differ in one crucial detail:
Both source
and sys.source
do not simply parse
and then eval
the file content. Instead, they parse the file content and then iterate manually over the parsed expressions, and eval
them one by one. I do not understand why this would be necessary in sys.source
(source
at least uses it to show verbose diagnostics, if so instructed; but sys.source
does nothing of the kind):
for (i in seq_along(exprs)) eval(exprs[i], envir)
What is the purpose of eval
ing statements separately? And why is it iterating over indices instead directly over the sub-expressions? What other caveats are there?
To clarify: I am not concerned about the additional parameters of source
and parse
, some of which may be set via options.
1 The reason that source
is tripped up by the encoding but parse
isn’t boils down to the fact that source
attempts to convert the input text. parse
does no such thing, it reads the file’s byte content as-is and simply marks its Encoding
as UTF-8
in memory.
This is not a full answer as it primarily addresses the seq_along
part of the question, but too lengthy to include as comments.
One key difference between the seq_along
followed by [
vs just using for i in x
approach (which I believe is be similar to seq_along
followed by [[
instead of [
) is that the former preserves the expression. Here is an example to illustrate the difference:
> txt <- "x <- 1 + 1 + # abnormal expression + 2 * + 3 + " > x <- parse(text=txt, keep.source=TRUE) > > for(i in x) print(i) x <- 1 + 1 2 * 3 > for(i in seq_along(x)) print(x[i]) expression(x <- 1 + 1) expression(2 * 3)
Alternatively:
> attributes(x[[2]]) NULL > attributes(x[2]) $srcref $srcref[[1]] 2 * 3
Whether this has any practical impact when comparing to eval(parse(..., keep.source=T))
, I can only say that it could, but can't imagine a situation where it does.
Note that subsetting expression separately also leads to the srcref
business getting subset, which could conceivably be useful (...maybe?).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With