Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to connect to MySQL using UTF8 within a perl script?

In a nutshell:

Within a Perl-Script: How do I connect to MySQL in a way that allows to transmit the four-byte unicode character U+1F61C ("😜") from the perl script to a MySQL-Table where this character should be stored?

Using {mysql_enable_utf8 => 1} doesn't solve the problem.

In detail:

I have exactly the same problem as described in the Question ERROR 1366 (HY000): Incorrect string value: '\xF0\x9F\x98\x9C' for column 'comment' at row 1 and even with the same Unicode character (😜 = U+1F61C = FACE WITH STUCK-OUT TONGUE AND WINKING EYE) which produces the error message

DBD::mysql::st execute failed: Incorrect string value: '\xF0\x9F\x98\x9C' for column ...

But I don't use PHP, I use Perl.

The accepted answer in the other question says:

  • Run MySQL 5.5 or later.
    I check the version:

    mysql> select version();
    +-------------------------+
    | version()               |
    +-------------------------+
    | 5.7.13-0ubuntu0.16.04.2 |
    +-------------------------+
    

    So it is 5.7 which is later than 5.5.
    ✅checked

  • Set table's character to utf8mb4.
    I check the character set of my database, my table and even of the reported column:

    mysql> SELECT default_character_set_name FROM information_schema.SCHEMATA 
        -> WHERE schema_name = "myDatabase";
    +----------------------------+
    | default_character_set_name |
    +----------------------------+
    | utf8mb4                    |
    +----------------------------+
    
    mysql> SELECT CCSA.character_set_name FROM information_schema.`TABLES` T,
        ->        information_schema.`COLLATION_CHARACTER_SET_APPLICABILITY` CCSA
        -> WHERE CCSA.collation_name = T.table_collation
        ->   AND T.table_schema = "myDatabase"
        ->   AND T.table_name   = "myTable";
    +--------------------+
    | character_set_name |
    +--------------------+
    | utf8mb4            |
    +--------------------+
    
    mysql>   SELECT character_set_name FROM information_schema.`COLUMNS` 
        -> WHERE table_schema = "myDatabase"
        ->   AND table_name   = "myTable"
        ->   AND column_name  = "myColumn";
    +--------------------+
    | character_set_name |
    +--------------------+
    | utf8mb4            |
    +--------------------+
    

    So my database, my table and the reported column all use the character set utf8mb4.
    ✅checked

  • Enable UTF8 on your MySQL connection.
    This seems to be the problem. The answer to the other question says:

    SET NAMES utf8, or use an option when connecting that similarly enables it.

I don't know how to SET NAMES utf8 within a perl script, so I did it how I did it over the last years. I think that this is "an option when connecting that similarly enables it".
It's at the end of the long line that begins with my $dbh = DBI->connect:

#!/usr/bin/perl -w
use strict;
use warnings;
use utf8;
use Encode;
use DBI;
binmode STDOUT, ":utf8";

#Here I connect using the parameter mysql_enable_utf8 (create database handle):
my $dbh = DBI->connect('DBI:mysql:database=myDatabase;host=localhost','aUser','aPassword',{mysql_enable_utf8 => 1});

#Prepare the statement (create statement handle):
my $sth = $dbh->prepare('INSERT INTO `myTable` (`myColumn`) VALUES(?);');

#This doesn't work: 
$sth->execute('😜');

#This doesn't work either: 
$sth->execute(encode_utf8('😜'));

#end processing:
$dbh->disconnect();
exit(0);

Both executes throw the same error (only the line number at the end changes):

DBD::mysql::st execute failed: Incorrect string value: '\xF0\x9F\x98\x9C' for column 'myColumn' at row 1 at myTestScript.pl line 16.

What am I doing wrong?
How can I do it better?

like image 351
Hubert Schölnast Avatar asked Aug 23 '16 08:08

Hubert Schölnast


People also ask

How do I set MySQL database to utf8?

To change the character set encoding to UTF-8 for the database itself, type the following command at the mysql> prompt. Replace dbname with the database name: Copy ALTER DATABASE dbname CHARACTER SET utf8 COLLATE utf8_general_ci; To exit the mysql program, type \q at the mysql> prompt.

Does MySQL support utf8?

MySQL supports multiple Unicode character sets: utf8mb4 : A UTF-8 encoding of the Unicode character set using one to four bytes per character. utf8mb3 : A UTF-8 encoding of the Unicode character set using one to three bytes per character.


1 Answers

The problem is with the SET NAMES utf8 command. In MySQL the utf8 character set is not truly utf8, it supports characters up 3 bytes only and the character in question has 4 bytes:

The utf8 character set in MySQL has these characteristics:

• No support for supplementary characters (BMP characters only).

• A maximum of three bytes per multibyte character.

The true utf8 is the utf8mb4 that you use as character set in the field itself. So, use SET NAMES utf8mb4

So from Perl you should use {mysql_enable_utf8mb4 => 1} instead of {mysql_enable_utf8 => 1}.

like image 58
Shadow Avatar answered Oct 14 '22 04:10

Shadow