Suppose I write a function that parses an input stream containing German. Below a toy example. The following works on my machine (because UTF8 is standard):
readLines(textConnection("Zürich"))
readLines(textConnection("Z\u00FCrich")) #same thing
However I want to make sure it works also when UTF-8
is not the current locale encoding. For example inside rApache, default is ascii
. Hence I pass the encoding parameter:
readLines(textConnection("Zürich", encoding="UTF-8"))
readLines(textConnection("Z\u00FCrich", encoding="UTF-8"))
But this actually results in output getting messed up. Why is this? How should I call textConnection
to make sure the stream gets read properly on any platform or locale?
The suggestion by @flodel did the trick indeed:
readLines(textConnection("Z\u00FCrich", encoding="UTF-8"), encoding="UTF-8")
However it never became clear to me why this is needed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With