I'm making a javascript app which retrieves <code>.json</code> files with jquery and injects data into the webpage it is embedded in. The <code>.json</code> files are encoded with UTF-8 and contains accented chars like é, ö and å. The problem is that I don't control the charset on the pages that are going to use the app. Some will be using UTF-8, but others will be using the iso-8859-1 charset. This will of course garble the special chars from the <code>.json</code> files. How do I convert special UTF-8 chars to their iso-8859-1 equivalent using javascript?

Actually, everything is typically stored as Unicode of some kind internally, but lets not go into that. I'm assuming you're getting the iconic "Ã¥Ã¤Ã¶" type strings because you're using an ISO-8859 as your character encoding. There's a trick you can do to convert those characters. The <code>escape</code> and <code>unescape</code> functions used for encoding and decoding query strings are defined for ISO characters, whereas the newer <code>encodeURIComponent</code> and <code>decodeURIComponent</code> which do the same thing, are defined for UTF8 characters. <code>escape</code> encodes extended ISO-8859-1 characters (UTF code points U+0080-U+00ff) as <code>%xx</code> (two-digit hex) whereas it encodes UTF codepoints U+0100 and above as <code>%uxxxx</code> (<code>%u</code> followed by four-digit hex.) For example, <code>escape("å") == "%E5"</code> and <code>escape("あ") == "%u3042"</code>. <code>encodeURIComponent</code> percent-encodes extended characters as a UTF8 byte sequence. For example, <code>encodeURIComponent("å") == "%C3%A5"</code> and <code>encodeURIComponent("あ") == "%E3%81%82"</code>. So you can do: <pre class="prettyprint"><code>fixedstring = decodeURIComponent(escape(utfstring)); </code></pre> For example, an incorrectly encoded character "å" becomes "Ã¥". The command does <code>escape("Ã¥") == "%C3%A5"</code> which is the two incorrect ISO characters encoded as single bytes. Then <code>decodeURIComponent("%C3%A5") == "å"</code>, where the two percent-encoded bytes are being interpreted as a UTF8 sequence. If you'd need to do the reverse for some reason, that works too: <pre class="prettyprint"><code>utfstring = unescape(encodeURIComponent(originalstring)); </code></pre> Is there a way to differentiate between bad UTF8 strings and ISO strings? Turns out there is. The decodeURIComponent function used above will throw an error if given a malformed encoded sequence. We can use this to detect with a great probability whether our string is UTF8 or ISO. <pre class="prettyprint"><code>var fixedstring; try{ // If the string is UTF-8, this will work and not throw an error. fixedstring=decodeURIComponent(escape(badstring)); }catch(e){ // If it isn't, an error will be thrown, and we can assume that we have an ISO string. fixedstring=badstring; } </code></pre>

How do I convert special UTF-8 chars to their iso-8859-1 equivalent using javascript?

2 Answers

Actually, everything is typically stored as Unicode of some kind internally, but lets not go into that. I'm assuming you're getting the iconic "Ã¥Ã¤Ã¶" type strings because you're using an ISO-8859 as your character encoding. There's a trick you can do to convert those characters. The escape and unescape functions used for encoding and decoding query strings are defined for ISO characters, whereas the newer encodeURIComponent and decodeURIComponent which do the same thing, are defined for UTF8 characters.

escape encodes extended ISO-8859-1 characters (UTF code points U+0080-U+00ff) as %xx (two-digit hex) whereas it encodes UTF codepoints U+0100 and above as %uxxxx (%u followed by four-digit hex.) For example, escape("å") == "%E5" and escape("あ") == "%u3042".

encodeURIComponent percent-encodes extended characters as a UTF8 byte sequence. For example, encodeURIComponent("å") == "%C3%A5" and encodeURIComponent("あ") == "%E3%81%82".

So you can do:

fixedstring = decodeURIComponent(escape(utfstring));

For example, an incorrectly encoded character "å" becomes "Ã¥". The command does escape("Ã¥") == "%C3%A5" which is the two incorrect ISO characters encoded as single bytes. Then decodeURIComponent("%C3%A5") == "å", where the two percent-encoded bytes are being interpreted as a UTF8 sequence.

If you'd need to do the reverse for some reason, that works too:

utfstring = unescape(encodeURIComponent(originalstring));

Is there a way to differentiate between bad UTF8 strings and ISO strings? Turns out there is. The decodeURIComponent function used above will throw an error if given a malformed encoded sequence. We can use this to detect with a great probability whether our string is UTF8 or ISO.

var fixedstring;  try{     // If the string is UTF-8, this will work and not throw an error.     fixedstring=decodeURIComponent(escape(badstring)); }catch(e){     // If it isn't, an error will be thrown, and we can assume that we have an ISO string.     fixedstring=badstring; }

153

answered Sep 22 '22 10:09

nitro2k01

The problem is that once the page is served up, the content is going to be in the encoding described in the content-type meta tag. The content in "wrong" encoding is already garbled.

You're best to do this on the server before serving up the page. Or as I have been know to say: UTF-8 end-to-end or die.

answered Sep 18 '22 10:09

Diodeus - James MacFarlane

Related questions
                            
                                express app server . listen all interfaces instead of localhost only
                            
                                React with ES7: Uncaught TypeError: Cannot read property 'state' of undefined [duplicate]
                            
                                How to handle query parameters in angular 2
                            
                                Multiple fields with same key in query params (axios request)?
                            
                                Moment js convert milliseconds into date and time
                            
                                Regex match text between tags [duplicate]
                            
                                How to store a javascript function in JSON
                            
                                What is the difference between progressive enhancement and graceful degradation?
                            
                                jQuery DatePicker with today as maxDate
                            
                                Cannot get material-ui datepicker to work
                            
                                How to block users from closing a window in Javascript?
                            
                                Compute elapsed time [duplicate]
                            
                                Hash keys / values as array [duplicate]
                            
                                react native use variable for image file
                            
                                Cannot find module '@angular-devkit/core'
                            
                                Insert text at cursor in a content editable div
                            
                                Couldn't find file 'jquery-ui'
                            
                                Asp.Net Mvc Url.Action in external js file?
                            
                                JS search in object values
                            
                                Making a Chrome Extension download a file

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I convert special UTF-8 chars to their iso-8859-1 equivalent using javascript?

Tags:

javascript

jquery

character-encoding

Hobhouse

People also ask

2 Answers

nitro2k01

Diodeus - James MacFarlane

Recent Activity

Donate For Us