my database (MySql) has a utf8_general collation. I am accessing data from database and showing a webpage (developed in Perl), it is showing Swedish characters (ä,å,ö) with a different characters. I checked in Mysql database, there I can see the data with ä,å,ö characters in it. It seems, there is a encoding problem while accessing data. While connecting to database, used following code
my($dbh) = DBI->connect($config{'dbDriver'},$config{'dbUser'},$config{'dbPass'}) or die "Kunde inte ansluta till $config{'dataSource'}: " . $DBI::errstr;
$dbh->{'mysql_enable_utf8'} = 1;
$dbh->do('set names utf8');
MySQL supports multiple Unicode character sets: utf8mb4 : A UTF-8 encoding of the Unicode character set using one to four bytes per character. utf8mb3 : A UTF-8 encoding of the Unicode character set using one to three bytes per character. This character set is deprecated in MySQL 8.0, and you should use utfmb4 instead.
UTF-8 is a valid IANA character set name, whereas utf8 is not. It's not even a valid alias. it refers to an implementation-provided locale, where settings of language, territory, and codeset are implementation-defined.
PHP UTF-8 Encoding – modifications to your php. The first thing you need to do is to modify your php. ini file to use UTF-8 as the default character set: default_charset = "utf-8"; (Note: You can subsequently use phpinfo() to verify that this has been set properly.)
Definition and Usage. The utf8_encode() function encodes an ISO-8859-1 string to UTF-8. Unicode is a universal standard, and has been developed to describe all possible characters of all languages plus a lot of symbols with one unique number for each character/symbol.
If each ä/å/ö is being represented in the output by two bytes, then it's also possible that you may be double-encoding the characters. (Given that the question already shows you doing $dbh->{'mysql_enable_utf8'} = 1;
, I suspect that this is the most likely case.) Another possibility, given that you're displaying this on a web page, is that the page may not be specifying that the charset is UTF-8 in its <head>
and the browser could be guessing incorrectly at the character encoding it uses.
Take a close look at your webapp framework, templating system, etc. to ensure that the values are only being encoded once between when they're retrieved from the database and when they reach the user's browser. Many frameworks/template engines (such as the combination of Dancer and TT that I normally use) will handle output encoding automatically if you configure them correctly, which means that the data will be double-encoded if it's explicitly encoded prior to being output.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With