Why is R reading UTF-8 header as text?

Question

I saved an Excel table as text (*.txt). Unfortunately, Excel don't let me choose the encoding. So I need to open it in Notepad (which opens as ANSI) and save it as UTF-8. Then, when I read it in R:

data <- read.csv("my_file.txt",header=TRUE,sep="	",encoding="UTF-8")

it shows the name of the first column beginning with "X.U.FEFF.". I know these are the bytes reserved to tell any program that the file is in UTF-8 format. So it shouldn't appear as text! Is this a bug? Or am I missing some option? Thanks in advance!

zwol · Accepted Answer

So I was going to give you instructions on how to manually open the file and check for and discard the BOM, but then I noticed this (in ?file):

As from R 3.0.0 the encoding "UTF-8-BOM" is accepted and will remove a Byte Order Mark if present (which it often is for files and webpages generated by Microsoft applications).

which means that if you have a sufficiently new R interpreter,

read.csv("my_file.txt", fileEncoding="UTF-8-BOM", ...other args...)

should do what you want.

Why is R reading UTF-8 header as text?

Tags:

r

csv

utf-8

byte-order-mark

file-encodings

Rodrigo

1 Answers

zwol

Recent Activity

Donate For Us

Why is R reading UTF-8 header as text?

Tags:

r

csv

utf-8

byte-order-mark

file-encodings

Rodrigo

1 Answers

zwol

Related questions

Recent Activity

Donate For Us