I'm writing a Chrome extension that works with a website that uses ISO-8859-1. Just to give some context, what my extension does is making posting in the site's forums quicker by adding a more convenient post form. The value of the textarea where the message is written is then sent through an Ajax call (using jQuery).
If the message contains characters like á
these characters appear as á in the posted message. Forcing the browser to display UTF-8 instead of ISO-8859-1 makes the á
appear correctly.
It is my understanding that Javascript uses UTF-8 for its strings, so it is my theory that if I transcode the string to ISO-8859-1 before sending it, it should solve my problem. However there seems to be no direct way to do this transcoding in Javascript, and I can't touch the server side code. Any advice?
I've tried setting the created form to use iso-8859-1 like this:
var form = document.createElement("form");
form.enctype = "application/x-www-form-urlencoded; charset=ISO-8859-1";
And also:
var form = document.createElement("form");
form.encoding = "ISO-8859-1";
But that doesn't seem to work.
EDIT:
The problem actually lied in how jQuery was urlencoding the message (or something along the way), I fixed this by telling jQuery not to process the data and doing it myself as is shown in the following snippet:
function cfaqs_post_message(msg) {
var url = cfaqs_build_post_url();
msg = escape(msg).replace(/\+/g, "%2B");
$.ajax({
type: "POST",
url: url,
processData: false,
data: "message=" + msg + "&post=Preview Message",
success: function(html) {
// ...
},
dataType: "html",
contentType: "application/x-www-form-urlencoded"
});
}
UTF-8 is a multibyte encoding that can represent any Unicode character. ISO 8859-1 is a single-byte encoding that can represent the first 256 Unicode characters. Both encode ASCII exactly the same way.
Strings in JavaScript are encoded using UTF-16.
While a JavaScript source file can have any kind of encoding, JavaScript will then convert it internally to UTF-16 before executing it. JavaScript strings are all UTF-16 sequences, as the ECMAScript standard says: When a String contains actual textual data, each element is considered to be a single UTF-16 code unit.
It is my understanding that Javascript uses UTF-8 for its strings
No, no.
Each page has its charset enconding defined in meta tag, just below head element
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8"/>
or
<head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1"/>
Besides that, each page should be edited with the target charset encoding. Otherwise, it will not work as expected.
And it is a good idea to define its target charset encoding on server side.
Java
<%@page pageEncoding="UTF-8" contentType="text/html; charset=UTF-8"%>
PHP
header("Content-Type: text/html; charset=UTF-8");
C#
I do not know how to...
And it could be a good idea to set up each script file whether it uses sensitive characters (á, é, í, ó, ú and so on...).
<script type="text/javascript" charset="UTF-8" src="/PATH/TO/FILE.js"></script>
...
So it is my theory that if I transcode the string to ISO-8859-1 before sending it, it should solve my problem
No, no.
The target server could handle strings in other than ISO-8859-1. For instance, Tomcat handles in ISO-8859-1, no matter how you set up your page. So, on server side, you could have to set up your request according how your set up your page.
Java
request.setCharacterEncoding("UTF-8")
PHP
// I do not know how to...
If you really want to translate the target charset encoding, TRY as follows
InternetExplorer
formElement.encoding = "application/x-www-form-urlencoded; charset=ISO-8859-1";
ELSE
formElement.enctype = "application/x-www-form-urlencoded; charset=ISO-8859-1";
Or you should provide a function that gets the numeric representation, in Unicode Character Set, used by each character. It will work regardless of the target charset encoding. For instance, á as Unicode Character Set is \u00E1;
alert("á without its Unicode Character Set numerical representation");
function convertToUnicodeCharacterSet(value) {
if(value == "á")
return "\u00E1";
}
alert("á Numerical representation in Unicode Character Set is: " + convertToUnicodeCharacterSet("á"));
Here you can see in action:
You can use this link as guideline (See JavaScript escapes)
Added to original answer how I implement jQuery funcionality
var dataArray = $(formElement).serializeArray();
var queryString = "";
for(var i = 0; i < dataArray.length; i++) {
queryString += "&" + dataArray[i]["name"] + "+" + encodeURIComponent(dataArray[i]["value"]);
}
$.ajax({
url:"url.htm",
data:dataString,
contentType:"application/x-www-form-urlencoded; charset=UTF-8",
success:function(response) {
// proccess response
});
});
It works fine without any headache.
Regards,
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With