I am trying to write utf-8 strings to a MySQL table using perl/DBI. For some reason the string is getting truncated at the first non-ascii character.
For example, if I set up the following table:
CREATE DATABASE testdb DEFAULT CHARSET=utf8;
CREATE TABLE testdb.testtable (textval CHAR(30)) DEFAULT CHARSET=utf8;
And then run the following perl code:
#!/usr/bin/perl
use strict;
use DBI;
my $dbh = DBI->connect('DBI:mysql:host=localhost;database=testdb', 'testuser', 'somepassword', {mysql_enable_utf8 => 1}) or die $DBI::errstr;
$dbh->do('SET NAMES utf8');
$dbh->do("INSERT INTO testtable (textval) VALUES ('the N\xFCrburgring')");
It actually writes "the N". (when it should be writing "the Nürburgring")
Looking in the MySQL query log, I see this:
271 Query INSERT INTO testtable (textval) VALUES ('the Nürburgring')
So the string is reaching the DB server intact.
If I enter the same query directly in the MySQL console:
INSERT INTO testtable (textval) VALUES ('the Nürburgring');
The entire string is written correctly. Any idea what I'm doing wrong?
You set the attribute mysql_enable_utf8
, so you promised the interface that you will give it a Perl string of characters. However, this is a buffer of octets in Latin1 encoding.
use Devel::Peek qw(Dump);
Dump "the N\xfcrburgring";
# FLAGS = (POK,READONLY,pPOK)
# PV = 0x208e4f0 "the N\374rburgring"\0
The fix is simple. Either notate literal characters without \x
escapes, use the utf8
pragma to tell Perl that your source code is in UTF-8 and save the source in UTF-8 encoding with your editor…
use utf8;
use Devel::Peek qw(Dump);
Dump "the Nürburgring";
# FLAGS = (POK,READONLY,pPOK,UTF8)
# PV = 0x20999f0 "the N\303\274rburgring"\0 [UTF8 "the N\x{fc}rburgring"]
… or decode the octets into a string. Most of the time you're dealing not with literals, but with data coming from the outside, so better learn about the whole topic of encoding.
use Encode qw(decode);
use Devel::Peek qw(Dump);
Dump decode 'Latin1', "the N\xfcrburgring";
# FLAGS = (TEMP,POK,pPOK,UTF8)
# PV = 0x208f6b0 "the N\303\274rburgring"\0 [UTF8 "the N\x{fc}rburgring"]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With