I am working on a page that uses JavaScipt to send data to a PHP script via AJAX POST. The problem is, if the input is in a language that is not Latin based I end-up storing gibberish in the MySQL table. Latin alphabet works fine.
The page itself is capable to rendering UTF-8 characters, if they are in a data provided on page load, it's the post that I struggle with.
اختبار
and save. See the Network POST request in browser's dev tools.
The post is made through the following JS function
function createEmptyStack(stackTitle) {
return $.ajax({
type:'POST',
url:'ajax.php',
data: {
"do": 'createEmptyStack',
newTitle: stackTitle
},
dataType: "json"
});
}
Here's my PHP code.
header('Content-Type: text/html; charset=utf-8');
$newTitle = trim($_POST['newTitle']);
$db->query("
INSERT INTO t1(project_id, label)
VALUES (".$_SESSION['project_id'].", '".$newTitle."')");
When I check for encoding on the page like this:
mb_detect_encoding($_POST['newTitle'], "auto");
I get result: UTF-8
I also tried the following header:
header("Content-type: application/json; charset=utf-8");
MySQL table collation where the data is supposed to go is set to utf8_general_ci
I have another page that has a form where users populate the same table and it works perfectly fine with ANY language. When I check on the other page why it is capable of inserting similar data into db successfully I see the following above insert query:
mysql_query("SET NAMES utf8");
I've attempted putting the same line above my query that the data still looks gibberish. I also tried the following couple alternatives:
mysql_query("SET CHARACTER SET utf8 ");
and
mysql_set_charset('utf8', $db);
...but to no avail. I'm stomped. Need help getting it figured out.
Environment:
PHP 5.6.40 (cgi-fcgi)
MySQL 5.6.45
UPDATE
I ran more tests.
I used a phrase "this is a test" in Arabic - هذا اختبار
It seems that ajax.php code works properly. After db insert it returns UTF-8 encoded values, that look like: "\u0647\u0630\u0627 \u0627\u062e\u062a\u0628\u0627\u0631" and the encoding is set to:"UTF-8", however the inserted data in my db table appears as: هذا اختبار
So why am I not jumping to converting my db table to different collation? Couple of reasons: it has nearly .5 mil records and it actually works properly when I go to another page that does very similar INSERT.
Turns out my other page is using ASCII encoding when inserting the data. So it's only natural I try to conver to ASCII on ajax.php. The problem I end-up with blank data. I am so confused now...
Thanks
FIXED: based on a few clues I ended-up rewriting all functions for this page to PDO and it worked!
المراكز
is Mojibake, or possibly "double encoding", for المراكز -- Please do SELECT col, hex(col) ...
to see which of these looks like:
Mojibake: D8A7D984D985D8B1D8A7D983D8B2
double encoding: C398C2A7C399E2809EC399E280A6C398C2B1C398C2A7C399C692C398C2B2
If Mojibake:
<meta charset=UTF-8>
.If double-encoding: This is caused by converting from latin1 (or whatever) to utf8, then treating those bytes as if they were latin1 and repeating the conversion.
More discussion:
Trouble with UTF-8 characters; what I see is not what I stored
Do not use the mysql_*
interface in PHP; switch to mysqli_*
or PDO interfaces. mysql_*
was removed in PHP 5.7.
If your database is latin1, it will store unicode characters as multi-byte characters. If it's utf-8 based, it will still store multiple characters but displayed in a more "sensible" manner.
If, your ر character is represented as XYZ (3 bytes), then when you retrieve XYZ, the browser will reassemble them into a visible ر.
However, if your database is utf-8, it'll further encode each component, so that you are "reliably" seeing XYZ in the end. Let's say X is denoted as x1,x2, and Y is just y, and Z is z1,z2,z3, so instead of seeing ر, which is stored as XYZ, you now see x1x2yz1z2z3, which is shown as XYZ.
Try converting your database to latin1 to at least confirm my theory. Thanks.
Edit:
There is no need to use a utf8 js library. Make sure your page's character encoding is utf8:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
When you POST the data, you can encode it with encodeURIComponent before sending with a XHR request. I'm not sure whether the jQuery flavor of $.ajax already does the encoding.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With