Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to change dplyr::tbl connection encoding to utf8?

Tags:

r

dplyr

rmysql

In a MySQL database, a table is encoded in utf8, but for some reason the connection is in latin1.

res <- RMySQL::dbSendQuery(con,"show variables like 'character_set_%'")
dbFetch(res)
           Variable_name                      Value
1     character_set_client                     latin1
2 character_set_connection                     latin1
3   character_set_database                    utf8mb4
4 character_set_filesystem                     binary
5    character_set_results                     latin1
6     character_set_server                     latin1
7     character_set_system                       utf8
8       character_sets_dir /usr/share/mysql/charsets/

This page explain how to set the connection's character set to utf8 using RMySQL.

RMySQL::dbGetQuery(con,"show variables like 'character_set_%'")
RMySQL::dbGetQuery(con,"set names utf8")

But I actually prefer to use the dplyr::tbl to query the database. Since the connection created by dplyr::src_mysql only has the possibility to send sql statements that create tables. What is the dplyr way to set the connection setting to use utf8 encoding?

like image 863
Paul Rougieux Avatar asked Jun 19 '15 07:06

Paul Rougieux


2 Answers

I ran into the same problem, which I solved as follows:

foo_db <- src_mysql(host='0.0.0.0',user='dbuser',password='a_password',
                    dbname='FlightTimes',port=3336)
dbGetQuery(foo_db$con,'SET NAMES utf8')

I found this was possible by looking at the structure of foo_db via str(foo_db), seeing there was a con attribute of class MySQLConnection, then applying your dbGetQuery mantra.

like image 175
shabbychef Avatar answered Sep 28 '22 15:09

shabbychef


Edit the server option file (located in /etc/mysql/my.cnf on a Debian system) and add the following options:

collation-server = utf8_unicode_ci
character-set-server = utf8
skip-character-set-client-handshake

The mysql server configuration file can also be edited with mysql-workbench.

After this change, dplyr::tbl fetches character vector encoded in utf-8.

like image 23
Paul Rougieux Avatar answered Sep 28 '22 14:09

Paul Rougieux