Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

utf8 encoding in Perl and MySql

Tags:

mysql

utf-8

perl

my database (MySql) has a utf8_general collation. I am accessing data from database and showing a webpage (developed in Perl), it is showing Swedish characters (ä,å,ö) with a different characters. I checked in Mysql database, there I can see the data with ä,å,ö characters in it. It seems, there is a encoding problem while accessing data. While connecting to database, used following code

my($dbh) = DBI->connect($config{'dbDriver'},$config{'dbUser'},$config{'dbPass'}) or die "Kunde inte ansluta till $config{'dataSource'}: " . $DBI::errstr;
$dbh->{'mysql_enable_utf8'} = 1;
$dbh->do('set names utf8');
like image 643
dotnetrocks Avatar asked Oct 31 '12 10:10

dotnetrocks


People also ask

What is UTF-8 in MySQL?

MySQL supports multiple Unicode character sets: utf8mb4 : A UTF-8 encoding of the Unicode character set using one to four bytes per character. utf8mb3 : A UTF-8 encoding of the Unicode character set using one to three bytes per character. This character set is deprecated in MySQL 8.0, and you should use utfmb4 instead.

Is UTF-8 same as UTF-8?

UTF-8 is a valid IANA character set name, whereas utf8 is not. It's not even a valid alias. it refers to an implementation-provided locale, where settings of language, territory, and codeset are implementation-defined.

How to set UTF-8 encoding in PHP?

PHP UTF-8 Encoding – modifications to your php. The first thing you need to do is to modify your php. ini file to use UTF-8 as the default character set: default_charset = "utf-8"; (Note: You can subsequently use phpinfo() to verify that this has been set properly.)

What is UTF-8 PHP?

Definition and Usage. The utf8_encode() function encodes an ISO-8859-1 string to UTF-8. Unicode is a universal standard, and has been developed to describe all possible characters of all languages plus a lot of symbols with one unique number for each character/symbol.


1 Answers

If each ä/å/ö is being represented in the output by two bytes, then it's also possible that you may be double-encoding the characters. (Given that the question already shows you doing $dbh->{'mysql_enable_utf8'} = 1;, I suspect that this is the most likely case.) Another possibility, given that you're displaying this on a web page, is that the page may not be specifying that the charset is UTF-8 in its <head> and the browser could be guessing incorrectly at the character encoding it uses.

Take a close look at your webapp framework, templating system, etc. to ensure that the values are only being encoded once between when they're retrieved from the database and when they reach the user's browser. Many frameworks/template engines (such as the combination of Dancer and TT that I normally use) will handle output encoding automatically if you configure them correctly, which means that the data will be double-encoded if it's explicitly encoded prior to being output.

like image 176
Dave Sherohman Avatar answered Sep 29 '22 03:09

Dave Sherohman