Export UTF-8 BOM to .csv in R

Tags:

I am reading a file through RJDBC from a MySQL database and it correctly displays all letters in R (e.g., נווה שאנן). However, even when exporting it using write.csv and fileEncoding="UTF-8" the output looks like <U+0436>.<U+043A>. <U+041B><U+043E><U+0437><U+0435><U+043D><U+0435><U+0446>(in this case this is not the string above but a Bulgarian one) for Bulgarian, Hebrew, Chinese and so on. Other special characters like ã,ç etc work fine.

I suspect this is because of UTF-8 BOM but I did not find a solution on the net

My OS is a German Windows7.

edit: I tried

con<-file("file.csv",encoding="UTF-8")
write.csv(x,con,row.names=FALSE)

and the (afaik) equivalent write.csv(x, file="file.csv",fileEncoding="UTF-8",row.names=FALSE).

864

asked Sep 13 '11 13:09

Arthur G

2 Answers

The accepted answer did not help me in a similar application (R 3.1 in Windows, while I was trying to open the file in Excel). Anyway, based on this part of file documentation:

If a BOM is required (it is not recommended) when writing it should be written explicitly, e.g. by writeChar("\ufeff", con, eos = NULL) or writeBin(as.raw(c(0xef, 0xbb, 0xbf)), binary_con)

I came up with the following workaround:

write.csv.utf8.BOM <- function(df, filename)
{
    con <- file(filename, "w")
    tryCatch({
    for (i in 1:ncol(df))
        df[,i] = iconv(df[,i], to = "UTF-8") 
    writeChar(iconv("\ufeff", to = "UTF-8"), con, eos = NULL)
    write.csv(df, file = con)
    },finally = {close(con)})
}

Note that df is the data.frame and filename is the path to the csv file.

answered Sep 29 '22 11:09

Ron

On help page to Encoding (help("Encoding")) you could read about special encoding - bytes.

Using this I was able to generate csv file by:

v <- "נווה שאנן"
X <- data.frame(v1=rep(v,3), v2=LETTERS[1:3], v3=0, stringsAsFactors=FALSE)

Encoding(X$v1) <- "bytes"
write.csv(X, "test.csv", row.names=FALSE)

Take care about differences between factor and character. The following should work:

id_characters <- which(sapply(X,
    function(x) is.character(x) && Encoding(x)=="UTF-8"))
for (i in id_characters) Encoding(X[[i]]) <- "bytes"

id_factors <- which(sapply(X,
    function(x) is.factor(x) && Encoding(levels(x))=="UTF-8"))
for (i in id_factors) Encoding(levels(X[[i]])) <- "bytes"

write.csv(X, "test.csv", row.names=FALSE)

answered Sep 29 '22 11:09

Marek

Related questions
                            
                                Concatenating two vectors in R [duplicate]
                            
                                Why does apt-get install r-base install 3.2.3 instead of 3.4.0 in R?
                            
                                How to open .rdb file using R
                            
                                Remove constant columns with or without NAs
                            
                                R: Using a string as an argument to mutate verb in dplyr
                            
                                rmarkdown::render() in a loop - cannot allocate vector of size
                            
                                Summarizing by dynamic column name in dplyr
                            
                                Efficient sampling from nested lists
                            
                                R measuring distance from a coastline
                            
                                Sweave v. Knitr v. Rmarkdown: code chunk headers
                            
                                Different number of outliers with ggplot2
                            
                                Measure peak memory usage in R
                            
                                Merge two data frames together that have the same variable names and data types
                            
                                How do I ignore errors and continue processing list items?
                            
                                Identify all objects of given class for further processing
                            
                                Sys.setlocale: request to set locale ... cannot be honored
                            
                                time series barplot in R
                            
                                How do I plot a stacked bar with ggplot?
                            
                                How can I document datasets without adding them to the Collate field?
                            
                                How to calculate autocorrelation in r (zoo object)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Export UTF-8 BOM to .csv in R

Tags:

r

utf-8

byte-order-mark

export-to-csv

Arthur G

People also ask

2 Answers

Ron

Marek

Recent Activity

Donate For Us