Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

jQuery: AJAX umlauts & special characters are a mess

I've just created my first ajax function with jQuery which actually works, but unfortunately the character encoding (for characters like ä, ö, ü, ß, č, ć, å, ø) is a nightmare.

My files and my database are all UTF-8. I've tried a multitude of options in the ajax function and the PHP function, none of which were satisfactory.

This is my ajax

var dataString = {
 'name': name,
 'mail': mail
 // other stuff
}


    $.ajax({
type: "POST",
url: "/post.php",
data: dataString,
contentType: "application/x-www-form-urlencoded;charset=UTF-8",
cache: false,
success: function(html){
 // do stuff
}

I've tried it without contentType: "application/x-www-form-urlencoded;charset=UTF-8" and I've tried to wrap the affected data in encodeURIComponent(), none of which worked.

When I use that AJAX with htmlentities() in my php, my umlauts look like this in plain text: UE �, AE �, OE �, ue ü, ae ä, oe o

And like this in the database: UE Ü , AE Ä, OE Ö, ue ü, ae ä, oe o

If I don't use htmlentities() but mysql_real_escape_string() instead (or neither), they look good in plain text, but they look like this in the database: AE Ä, OE Ö, UE Ü, ae ä oe ö ue ü

I've been trying tons of options for hours now, but I can't find a solution that works. So far the only option I seem to have is having them look like a total mess in the database, but that would be very contraproductive if those data sets need to be edited.

like image 941
rayne Avatar asked Mar 29 '10 15:03

rayne


1 Answers

I've tried to wrap the affected data in encodeURIComponent()

Nah, if you're passing in a {} object, jQuery will take care of UTF-8 and URL-encoding it for you.

When I use that AJAX with htmlentities() in my php, my umlauts look like this in plain text: UE �, AE �, OE �, ue ü, ae ä, oe o

If you must use htmlentities(), you have to tell it your encoding is UTF-8 in the optional $charset argument, else it will (stupidly) default to treating all your bytes as ISO-8859-1, and encode them to inappropriate entity references, one for each byte.

Better is to use htmlspecialchars() instead, as it does not attempt to apply unnecessary encoding to characters other than the few ASCII characters that really need it.

And like this in the database: UE Ü , AE Ä, OE Ö, ue ü, ae ä, oe o

How are you determining that? Does the tool you are using to grab data out of the database know about Unicode? (If it's a dodgy PHP web admin interface, maybe not. PHP isn't great at Unicode.)

It is possible that you're storing proper UTF-8 bytes in the database, but in tables marked as having a Latin-1 collation. This will work, in as much as you'll get the same bytes out as you put in, but if MySQL doesn't know they're UTF-8 bytes then case-insensitive string comparisons outside the ASCII range won't work right, so looking for Ä won't match ä. That may or may not matter to you.

If I don't use htmlentities() but mysql_real_escape_string() instead

Whoah, careful. HTML-escaping is for the output stage to the page. SQL-string-literal-escaping occurs when creating an SQL query. You need them both, but don't mix them up or attempt to do them at the same stage, or you'll have all sorts of weird escapes-gone-wrong and potential vulnerabilities.

like image 58
bobince Avatar answered Sep 24 '22 01:09

bobince