Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I transcode a Javascript string to ISO-8859-1?

Tags:

I'm writing a Chrome extension that works with a website that uses ISO-8859-1. Just to give some context, what my extension does is making posting in the site's forums quicker by adding a more convenient post form. The value of the textarea where the message is written is then sent through an Ajax call (using jQuery).

If the message contains characters like á these characters appear as á in the posted message. Forcing the browser to display UTF-8 instead of ISO-8859-1 makes the á appear correctly.

It is my understanding that Javascript uses UTF-8 for its strings, so it is my theory that if I transcode the string to ISO-8859-1 before sending it, it should solve my problem. However there seems to be no direct way to do this transcoding in Javascript, and I can't touch the server side code. Any advice?

I've tried setting the created form to use iso-8859-1 like this:

var form = document.createElement("form");
form.enctype = "application/x-www-form-urlencoded; charset=ISO-8859-1";

And also:

var form = document.createElement("form");
form.encoding = "ISO-8859-1";

But that doesn't seem to work.

EDIT:

The problem actually lied in how jQuery was urlencoding the message (or something along the way), I fixed this by telling jQuery not to process the data and doing it myself as is shown in the following snippet:

function cfaqs_post_message(msg) {
  var url = cfaqs_build_post_url();
  msg = escape(msg).replace(/\+/g, "%2B");
  $.ajax({
    type: "POST",
    url: url,
    processData: false,
    data: "message=" + msg + "&post=Preview Message",
    success: function(html) {
      // ...
    },
    dataType: "html",
    contentType: "application/x-www-form-urlencoded"
  });
}
like image 555
Marcos Marin Avatar asked Feb 17 '10 19:02

Marcos Marin


People also ask

What is the difference between UTF-8 and ISO 8859-1?

UTF-8 is a multibyte encoding that can represent any Unicode character. ISO 8859-1 is a single-byte encoding that can represent the first 256 Unicode characters. Both encode ASCII exactly the same way.

What encoding do JavaScript strings use?

Strings in JavaScript are encoded using UTF-16.

Are JavaScript strings UTF-8?

While a JavaScript source file can have any kind of encoding, JavaScript will then convert it internally to UTF-16 before executing it. JavaScript strings are all UTF-16 sequences, as the ECMAScript standard says: When a String contains actual textual data, each element is considered to be a single UTF-16 code unit.


1 Answers

It is my understanding that Javascript uses UTF-8 for its strings

No, no.

Each page has its charset enconding defined in meta tag, just below head element

<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8"/>

or

<head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1"/>

Besides that, each page should be edited with the target charset encoding. Otherwise, it will not work as expected.

And it is a good idea to define its target charset encoding on server side.

Java
<%@page pageEncoding="UTF-8" contentType="text/html; charset=UTF-8"%>

PHP
header("Content-Type: text/html; charset=UTF-8");

C#
I do not know how to...

And it could be a good idea to set up each script file whether it uses sensitive characters (á, é, í, ó, ú and so on...).

<script type="text/javascript" charset="UTF-8" src="/PATH/TO/FILE.js"></script>

...

So it is my theory that if I transcode the string to ISO-8859-1 before sending it, it should solve my problem

No, no.

The target server could handle strings in other than ISO-8859-1. For instance, Tomcat handles in ISO-8859-1, no matter how you set up your page. So, on server side, you could have to set up your request according how your set up your page.

Java
request.setCharacterEncoding("UTF-8")

PHP
// I do not know how to...

If you really want to translate the target charset encoding, TRY as follows

InternetExplorer
    formElement.encoding = "application/x-www-form-urlencoded; charset=ISO-8859-1";
ELSE
    formElement.enctype  = "application/x-www-form-urlencoded; charset=ISO-8859-1";

Or you should provide a function that gets the numeric representation, in Unicode Character Set, used by each character. It will work regardless of the target charset encoding. For instance, á as Unicode Character Set is \u00E1;

alert("á without its Unicode Character Set numerical representation");
function convertToUnicodeCharacterSet(value) {
    if(value == "á")
        return "\u00E1";
}
alert("á Numerical representation in Unicode Character Set is: " + convertToUnicodeCharacterSet("á"));

Here you can see in action:

You can use this link as guideline (See JavaScript escapes)

Added to original answer how I implement jQuery funcionality

var dataArray = $(formElement).serializeArray();
var queryString = "";
for(var i = 0; i < dataArray.length; i++) {
    queryString += "&" + dataArray[i]["name"] + "+" + encodeURIComponent(dataArray[i]["value"]);
}
$.ajax({
    url:"url.htm",
    data:dataString,
    contentType:"application/x-www-form-urlencoded; charset=UTF-8",
    success:function(response) {
        // proccess response
    });
});

It works fine without any headache.

Regards,

like image 116
Arthur Ronald Avatar answered Nov 05 '22 01:11

Arthur Ronald