I can't figure this out for the life of me.
I have a query that pulls translations of elements on a page. So any number of 15 languages can appear on that page. When I start to add languages like Swedish anything that has a symbol such as ö
results in the whole field returning a null string.
I've verified the encoding on the table and it claims it's using UTF-8
but seeing as how this doesn't work I'm confused.
Here is the query I'm working with:
SELECT
form.form_id,
elem.elem_type,
elem.elem_name,
elem.elem_format,
elem.elem_required,
trans.trans_label,`
trans.trans_description,
trans.trans_defaultValue,
trans.trans_other,
elem.elem_advancedcommand
FROM
events_form form
LEFT JOIN
events_form_elements elem
ON
form.event_id = elem.event_id
INNER JOIN
events_form_translations trans
ON
elem.elem_id = trans.elem_id
INNER JOIN
events_form_languages lang
ON
trans.lang_id = lang.lang_id
WHERE
form.form_id = '{$formid}' AND lang.language = '{$language}'
ORDER BY
elem.elem_sortorder
Now I tried to do something like:
CONVERT(CAST(trans.trans_description as BINARY) USING latin1) as trans_description,
To force it to covert the encoding but that doesn't yield a result at all.
After I get the result it's immediately json_encoded and returned to the user (Ajax Request). I DON'T think it's the json_encode as doing a print_r
of the output array yields the same issues.
Also.. lastly, the system I'm building on is using xPDO so I'm not too sure if that's the issue either.
EDIT:
It seems that PHP IS returning a correct value or at least a value for example here is a print_r
dump:
[trans_label] => Ditt f�rnamn?
[trans_description] =>
[trans_defaultValue] => First Name
So it seems that when my json_encode touches that string is when it turns the string to null.
what is the difference between utf8 and latin1? They are different encodings (with some characters mapped to common byte sequences, e.g. the ASCII characters and many accented letters). UTF-8 is one encoding of Unicode with all its codepoints; Latin1 encodes less than 256 characters.
utf-8 can store only 1, 2 or 3 bytes characters, while utf8mb4 can store 4 bytes characters as well. utf-8 is a subset of characters given by utf8mb4 .
MySQL supports multiple Unicode character sets: utf8mb4 : A UTF-8 encoding of the Unicode character set using one to four bytes per character. utf8mb3 : A UTF-8 encoding of the Unicode character set using one to three bytes per character. This character set is deprecated in MySQL 8.0, and you should use utfmb4 instead.
The CHAR and VARCHAR types are declared with a length that indicates the maximum number of characters you want to store. For example, CHAR(30) can hold up to 30 characters. The length of a CHAR column is fixed to the length that you declare when you create the table. The length can be any value from 0 to 255.
Your PDO connection string should specify the encoding. For example:
mysql:host=localhost;port=3306;dbname=test;charset=utf8
This controls the encoding that the database driver will use when it returns a result, and the encoding the driver assumes your queries are in. If you don't specify it, the default encoding will be used. Often the default is latin1
.
You can confirm this by printing the hexadecimal representation of the data with bin2hex
in PHP: the ö in förnamn is being returned as f6
. If the text was encoded in UTF-8 you would obtain c3b6
.
You said nothing about the encoding of your web pages.
Do you have that line in the <head>
section of your page to force the encoding to UTF-8?
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With