Read a UTF-8 text file with BOM

Tags:

I have a text file with Byte order mark (U+FEFF) at the beginning. I am trying to read the file in R. Is it possible to avoid the Byte order mark?

The function fread (from the data.table package) reads the file, but adds ļ»æ at the beginning of the first variable name:

> names(frame_pers)[1]
[1] "ļ»æreg_date"

The same is with read.csv function.

Currently I have made a function which removes the BOM from the first column name, but I believe there should be a way how to automatically strip the BOM.

remove.BOM <- function(x) setnames(x, 1, substring(names(x)[1], 4))

> names(frame_pers)[1]
[1] "ļ»æreg_date"
> remove.BOM(frame_pers)
> names(frame_pers)[1]
[1] "reg_date"

I am using the native encoding for the R session:

> options("encoding" = "")
> options("encoding")
$encoding
[1] ""

723

asked Feb 07 '14 10:02

djhurio

Video Answer

2 Answers

Have you tried read.csv(..., fileEncoding = "UTF-8-BOM")?. ?file says:

As from R 3.0.0 the encoding ‘"UTF-8-BOM"’ is accepted and will remove a Byte Order Mark if present (which it often is for files and webpages generated by Microsoft applications).

189

answered Sep 29 '22 13:09

hadley

This was handled between versions 1.9.6 and 1.9.8 with this commit; update your data.table installation to fix this.

Once done, you can just use fread:

fread("file_name.csv")

answered Sep 29 '22 13:09

MichaelChirico

Related questions
                            
                                Creating dummy variables in R data.table
                            
                                Does mutate change tbl by reference?
                            
                                How to read knitr/Rmd cache in interactive session?
                            
                                Extracting text after last period in string [duplicate]
                            
                                Extract text after a symbol in R
                            
                                knitr: getting a parse_all error in R when converting Rmd file into HTML
                            
                                How to manipulate NULL elements in a nested list?
                            
                                Static Variables in R
                            
                                How do I get all the output from script I am running in RStudio
                            
                                LDA with topicmodels, how can I see which topics different documents belong to?
                            
                                ggplot2 error "no layers in plot"
                            
                                Remove lines from color and fill legends
                            
                                Left justify text from multi-line facet labels
                            
                                Read Json file into a data.frame without nested lists
                            
                                Change plotly chart y variable based on selectInput
                            
                                How do I exclude columns from a data.table?
                            
                                R: [unixODBC][Driver Manager]Can't open lib 'SQL Server' : file not found
                            
                                Error : Unable to start png() device
                            
                                scale_fill_manual define color for NA values
                            
                                truncate string from a certain character in R [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Read a UTF-8 text file with BOM

Tags:

r

character-encoding

unicode

utf-8

byte-order-mark