Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

unicode conversion and export in R

Tags:

r

unicode

I have created a script below for converting unicode into chinese characters, the last string in temp.df[,"name_unicode"] is "§®£" (without quote), so that people not knowing chinese can also help.

library(RODBC)
library(Unicode)

temp.df <- data.frame(name_unicode=c("&#38515;&#22823;&#25991;",
                                     "&#38515;&#23567;&#25935;",
                                     "&#38515;&#19968;&#23665;",
                                     "&#167;&#174;&#163;"),
                      stringsAsFactors=FALSE)

temp.df[,"name_unicode_mod"] <- sapply(temp.df[,"name_unicode"],
                                        function(x) {
                                          temp <- unlist(strsplit(x,";"))
                                          temp <- sprintf("%x",as.integer(gsub("[^0-9]","",temp)))
                                          temp <- intToUtf8(as.u_char_range(temp))
                                          return(temp)
                                          })


write.csv(temp.df,file("test.csv",encoding="UTF-8"),row.names=FALSE)

The output for temp.df[,"name_unicode_mod"] is OK for R console. But I need to export them out in csv or xls format. I tried write.csv, write.table, odbcConnectExcel in RODBC but all gives me something like <U+00A7><U+00AE><U+00A3>.

Can anyone help? Thanks.

P.S. I am using R 3.0.0 and Win7

like image 717
lokheart Avatar asked Apr 16 '13 04:04

lokheart


1 Answers

Using a binary writing will work for your case. The following is a small sample code to do.

writeUtf8csv <- function(x, file) {
  con <- file(file, "wb")
  apply(x, 1, function(a) {
      b <- paste(paste(a, collapse=','), '\r\n', sep='')
      writeBin(charToRaw(b), con, endian="little")
    })
  close(con)
}

More details are shown in this reference page.

like image 157
Tomizono Avatar answered Oct 03 '22 11:10

Tomizono