Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Writing Unicode from R to SQL Server

I'm trying to write Unicode strings from R to SQL, and then use that SQL table to power a Power BI dashboard. Unfortunately, the Unicode characters only seem to work when I load the table back into R, and not when I view the table in SSMS or Power BI.

require(odbc)
require(DBI)
require(dplyr)
con <- DBI::dbConnect(odbc::odbc(),
                      .connection_string = "DRIVER={ODBC Driver 13 for SQL Server};SERVER=R9-0KY02L01\\SQLEXPRESS;Database=Test;trusted_connection=yes;")
testData <- data_frame(Characters = "❤")
dbWriteTable(con,"TestUnicode",testData,overwrite=TRUE)
result <- dbReadTable(con, "TestUnicode")
result$Characters

Successfully yields:

> result$Characters
[1] "❤"

However, when I pull that table in SSMS:

SELECT * FROM TestUnicode

I get two different characters:

Characters
~~~~~~~~~~
â¤

Those characters are also what appear in Power BI. How do I correctly pull the heart character outside of R?

like image 866
Jacqueline Nolis Avatar asked Jan 04 '18 23:01

Jacqueline Nolis


2 Answers

It turns out this is a bug somewhere in R/DBI/the ODBC driver. The issue is that R stores strings as UTF-8 encoded, while SQL Server stores them as UTF-16LE encoded. Also, when dbWriteTable creates a table, it by default creates a VARCHAR column for strings which can't even hold Unicode characters. Thus, you need to both:

  1. Change the column in the R data frame from being a string column to a list column of UTF-16LE raw bytes.
  2. When using dbWriteTable, specify the field type as being NVARCHAR(MAX)

This seems like something that should still be handled by either DBI or ODBC or something though.

require(odbc)
require(DBI)

# This function takes a string vector and turns it into a list of raw UTF-16LE bytes. 
# These will be needed to load into SQL Server
convertToUTF16 <- function(s){
  lapply(s, function(x) unlist(iconv(x,from="UTF-8",to="UTF-16LE",toRaw=TRUE)))
}

# create a connection to a sql table
connectionString <- "[YOUR CONNECTION STRING]"
con <- DBI::dbConnect(odbc::odbc(),
                      .connection_string = connectionString)

# our example data
testData <- data.frame(ID = c(1,2,3), Char = c("I", "❤","Apples"), stringsAsFactors=FALSE)

# we adjust the column with the UTF-8 strings to instead be a list column of UTF-16LE bytes
testData$Char <- convertToUTF16(testData$Char)

# write the table to the database, specifying the field type
dbWriteTable(con, 
             "UnicodeExample", 
             testData, 
             append=TRUE, 
             field.types = c(Char = "NVARCHAR(MAX)"))

dbDisconnect(con)
like image 135
Jacqueline Nolis Avatar answered Sep 30 '22 16:09

Jacqueline Nolis


Inspired by last answer and github: r-dbi/DBI#215: Storing unicode characters in SQL Server

Following field.types = c(Char = "NVARCHAR(MAX)") but with vector and compute of max because of the error dbReadTable/dbGetQuery returns Invalid Descriptor Index .... :


vector_nvarchar<-c(Filter(Negate(is.null), 
                              (
                                lapply(testData,function(x){
                                  if (is.character(x) ) c(
                                    names(x),
                                    paste0("NVARCHAR(", 
                                           max(
                                             # nvarchar(max) gave error dbReadTable/dbGetQuery returns Invalid Descriptor Index error on SQL server 
                                             # https://github.com/r-dbi/odbc/issues/112  
                                             # so we compute the max                                           
                                             nchar(
                                               iconv( #nchar doesn't work for UTF-8 :  help (nchar)
                                                 Filter(Negate(is.null),x)
                                                 ,"UTF-8","ASCII",sub ="x" 
                                               )
                                             )
                                             ,na.rm = TRUE)
                                           ,")"
                                    )
                                  )
                                })
                              )
    ))

con= DBI::dbConnect(odbc::odbc(),.connection_string=xxxxt, encoding = 'UTF-8')

DBI::dbWriteTable(con,"UnicodeExample",testData, overwrite= TRUE, append=FALSE, field.types= vector_nvarchar)

 DBI::dbGetQuery(con,iconv('select * from UnicodeExample'))
like image 22
phili_b Avatar answered Sep 30 '22 17:09

phili_b