I'm running R on a Windows machine which is directly linked to a PostgreSQL database. I'm not using RODBC. My database is encoded in UTF-8 as confirmed by the following R command: <pre class="prettyprint"><code>dbGetQuery(con, "SHOW CLIENT_ENCODING") # client_encoding # 1 UTF8 </code></pre> However, when some text is read into R, it displays as strange text in R. For example, the following text is shown in my PostgreSQL database: "Stéphane" After exporting to R it's shown as: "StÃ©phane" (the é is encoded as Ã©) When importing to R I use the <code>dbConnect</code> command to establish a connection and the <code>dbGetQuery</code> command to query data using SQL. I do not specify any text encoding anywhere when connecting to the database or when running a query. I've searched online and can't find a direct resolution to my issue. I found this link, but their issue is with RODBC, which I'm not using. This link is helpful in identifying the symbols, but I don't just want to do a find & replace in R... way too much data. I did try running the following commands below and I arrived at a warning. <pre class="prettyprint"><code>Sys.setlocale("LC_ALL", "en_US.UTF-8") # [1] "" # Warning message: # In Sys.setlocale("LC_ALL", "en_US.UTF-8") : # OS reports request to set locale to "en_US.UTF-8" cannot be honored Sys.setenv(LANG="en_US.UTF-8") Sys.setenv(LC_CTYPE="UTF-8") </code></pre> The warning occurs on the <code>Sys.setlocale("LC_ALL", "en_US.UTF-8")</code> command. My intuition is that this is a Windows specific issue and doesn't occur with Mac/Linux/Unix.

As Craig Ringer said, setting <code>client_encoding</code> to windows-1252 is probably not the best thing to do. Indeed, if the data you're retrieving contains a single exotic character, you're in trouble: <blockquote> Error in postgresqlExecStatement(conn, statement, ...) : RS-DBI driver: (could not Retrieve the result : ERROR: character 0xcca7 of encoding "UTF8" has no equivalent in "WIN1252" ) </blockquote> On the other hand, getting your R environment to use Unicode could be impossible (I have the same problem as you with <code>Sys.setlocale</code>... Same in this question too.). A workaround is to manually declare UTF-8 encoding on all your data, using a function like this one: <pre class="prettyprint lang-r prettyprint-override"><code>set_utf8 <- function(x) { # Declare UTF-8 encoding on all character columns: chr <- sapply(x, is.character) x[, chr] <- lapply(x[, chr, drop = FALSE], `Encoding<-`, "UTF-8") # Same on column names: Encoding(names(x)) <- "UTF-8" x } </code></pre> And you have to use this function in all your queries: <pre class="prettyprint"><code>set_utf8(dbGetQuery(con, "SELECT myvar FROM mytable")) </code></pre> <hr> EDIT: Another possibility is to use RPostgres unstead of RPostgreSQL. I tested it (with the same config as in your question), and as far as I can see all declared encodings are automatically set to UTF-8.

UTF-8 / Unicode Text Encoding with RPostgreSQL

Tags:

I'm running R on a Windows machine which is directly linked to a PostgreSQL database. I'm not using RODBC. My database is encoded in UTF-8 as confirmed by the following R command:

dbGetQuery(con, "SHOW CLIENT_ENCODING") #   client_encoding # 1            UTF8

However, when some text is read into R, it displays as strange text in R.

For example, the following text is shown in my PostgreSQL database: "Stéphane"

After exporting to R it's shown as: "StÃ©phane" (the é is encoded as Ã©)

When importing to R I use the dbConnect command to establish a connection and the dbGetQuery command to query data using SQL. I do not specify any text encoding anywhere when connecting to the database or when running a query.

I've searched online and can't find a direct resolution to my issue. I found this link, but their issue is with RODBC, which I'm not using.

This link is helpful in identifying the symbols, but I don't just want to do a find & replace in R... way too much data.

I did try running the following commands below and I arrived at a warning.

Sys.setlocale("LC_ALL", "en_US.UTF-8") # [1] "" # Warning message: # In Sys.setlocale("LC_ALL", "en_US.UTF-8") : #   OS reports request to set locale to "en_US.UTF-8" cannot be honored Sys.setenv(LANG="en_US.UTF-8") Sys.setenv(LC_CTYPE="UTF-8")

The warning occurs on the Sys.setlocale("LC_ALL", "en_US.UTF-8") command. My intuition is that this is a Windows specific issue and doesn't occur with Mac/Linux/Unix.

255

asked Jan 27 '14 22:01

David L

1 Answers

As Craig Ringer said, setting client_encoding to windows-1252 is probably not the best thing to do. Indeed, if the data you're retrieving contains a single exotic character, you're in trouble:

Error in postgresqlExecStatement(conn, statement, ...) : RS-DBI driver: (could not Retrieve the result : ERROR: character 0xcca7 of encoding "UTF8" has no equivalent in "WIN1252" )

On the other hand, getting your R environment to use Unicode could be impossible (I have the same problem as you with Sys.setlocale... Same in this question too.).

A workaround is to manually declare UTF-8 encoding on all your data, using a function like this one:

set_utf8 <- function(x) {   # Declare UTF-8 encoding on all character columns:   chr <- sapply(x, is.character)   x[, chr] <- lapply(x[, chr, drop = FALSE], `Encoding<-`, "UTF-8")   # Same on column names:   Encoding(names(x)) <- "UTF-8"   x }

And you have to use this function in all your queries:

set_utf8(dbGetQuery(con, "SELECT myvar FROM mytable"))

EDIT: Another possibility is to use RPostgres unstead of RPostgreSQL. I tested it (with the same config as in your question), and as far as I can see all declared encodings are automatically set to UTF-8.

173

answered Sep 28 '22 04:09

Scarabee

Related questions
                            
                                Pandas graphing a timeseries, with vertical lines at selected dates
                            
                                How to change default request time out on Heroku?
                            
                                Can't access HTTPS site on Elastic Beanstalk after configuring HTTPS in the load balancer
                            
                                How to take table name as an input parameter to the stored procedure?
                            
                                Node.js Express express.json and express.urlencoded with form submit
                            
                                how to use sendinput function C++
                            
                                Flask and Ajax Post HTTP 400 Bad Request Error
                            
                                NullPointerException in eclipse in Eclipse itself at PartServiceImpl.internalFixContext
                            
                                Cannot install phantomjs -- Is it a bitbucket issue?
                            
                                Is it safe to put TryDequeue in a while loop?
                            
                                Using "if let..." with many expressions
                            
                                Getting OpenSSL::X509::CertificateError nested asn1 error on Ruby

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With