I wish to convert an HTML file encoded in ANSI to UTF-8, using R.
Is there a tool, or a combination of tools, that can make this work?
Thanks.
Edit: o.k, I've narrowed my problem to another one. It is re-posted here: Using "cat" to write non-English characters into a .html file (in R)
3. Choose "UTF-8" from the drop-down box next to "Encoding" and click "Save." Your text file will be converted and saved in the UTF-8 format, although the file extension will remain the same. You can now able open and edit the document at any time and your special characters will be preserved.
you can use iconv:
writeLines(iconv(readLines("tmp.html"), from = "ANSI_X3.4-1986", to = "UTF8"), "tmp2.html")
tmp2.html should be utf-8.
Edit by Henrik in June 2015:
A working solution for Windows distilled from the comments is as follows:
writeLines(iconv(readLines("tmp.html"), from = "ANSI_X3.4-1986", to = "UTF8"),
file("tmp2.html", encoding="UTF-8"))
Update 2021: And if ANSI is the current locale, the following works as well (i.e., uses the local encoding as from
source):
writeLines(iconv(readLines("tmp.html"), from = "", to = "UTF8"),
file("tmp2.html", encoding="UTF-8"))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With