Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read text as UTF-8 encoding

Tags:

r

utf-8

locale

Suppose I write a function that parses an input stream containing German. Below a toy example. The following works on my machine (because UTF8 is standard):

readLines(textConnection("Zürich"))
readLines(textConnection("Z\u00FCrich")) #same thing

However I want to make sure it works also when UTF-8 is not the current locale encoding. For example inside rApache, default is ascii. Hence I pass the encoding parameter:

readLines(textConnection("Zürich", encoding="UTF-8"))
readLines(textConnection("Z\u00FCrich", encoding="UTF-8"))

But this actually results in output getting messed up. Why is this? How should I call textConnection to make sure the stream gets read properly on any platform or locale?

like image 212
Jeroen Ooms Avatar asked Mar 23 '23 06:03

Jeroen Ooms


1 Answers

The suggestion by @flodel did the trick indeed:

readLines(textConnection("Z\u00FCrich", encoding="UTF-8"), encoding="UTF-8")

However it never became clear to me why this is needed.

like image 150
Jeroen Ooms Avatar answered Apr 06 '23 05:04

Jeroen Ooms