Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does my UTF8 data from my mod_perl application still get garbled in the web browser?

Before I begin, I would like to highlight the structure of what I am working with.

  1. There is a text file from which a specific text is taken. The file is encoded in utf-8
  2. Perl takes the file and prints it into a page. Everything is displayed as it should be. Perl is set to use utf-8
  3. The web page Perl generates has the following header <meta content="text/html;charset=utf-8" http-equiv="content-type"/>. Hence it is utf-8
  4. After the first load, everything is loaded dynamically via jQuery/AJAX. By flipping through pages, it is possible to load the exact same text, only this time it is loaded by JavaScript. The Request has following header Content-Type: application/x-www-form-urlencoded; charset=UTF-8
  5. The Perl handler which processes the AJAX Request on the Backend delivers contents in utf-8
  6. The AJAX Handler calls up a function in our custom Framework. Before the Framework prints out the text, it is displayed correctly as "üöä". After being sent to the AJAX Handler, it reads "x{c3}\x{b6}\x{c3}\x{a4}\x{c3}\x{bc}" which is the utf-8 representation of "üöä".
  7. After the AJAX Handler delivers its package to the client as JSON, the webpage prints the following: "öäü".
  8. The JS and Perl files themselves are saved in utf-8 (default setting in Eclipse)

These are the symptoms. I tried everything Google told me and I still have the problem. Does anyone have a clue what it could be? If you need any specific code snippet, tell me so and I'll try to paste it.

Edit 1

The Response Header from the AJAX Handler

Date: Mon, 09 Nov 2009 11:40:27 GMT
Server: Apache/2.2.10 (Linux/SUSE)
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html; charset="utf-8"

200 OK

Answer

With the help of you folks and this page, I was able to track down the problem. Seems like the problem was not the encoding by itself, but rather Perl encoding my variable $text twice as utf-8 (according to the site). The solution was as simple as adding Encode::decode_utf8().

I was searching in the completely wrong place to begin with. I thank you all who helped me search in the right place :)

#spreads some upvote love#

like image 304
Mike Avatar asked Mar 01 '23 02:03

Mike


1 Answers

returns the following: &38;&65;&116;&105;&108;&100;&101;&59;&38;&112;&97;&114;&97;&59;...

That's:

&Atilde;&para;&Atilde;&curren;&Atilde;&frac14;

Which says your AJAX handler is using an HTML-entity-encoding function for its output, that is assuming input from the ISO-8859-1 character set. You could use a character-reference encoder that knew about UTF-8 instead, but probably it will be easier just to encode the potentially-special characters <>&"' and no others.

The Request has following header Content-Type: application/x-www-form-urlencoded; charset=UTF-8

There is no such parameter as charset for the MIME type application/x-www-form-urlencoded. This will be ignored. Form-encoded strings are inherently byte-based; it is up to the application to decide what character set they are treated as (if any; maybe the application does just want bytes).

like image 152
bobince Avatar answered Mar 05 '23 19:03

bobince