What are the implications of a change from UTF-8 to UTF-16 for HTML encoding? I would like to know your thoughts on the issue. Are there things I need to think of before making such a change? Note: Interested due to enormous amounts of japanese and chinese text I need to handle.

I can think of a few things that will go wrong: <ol> <li>You MUST specify that it's UTF-16 in the HTTP header. Unlike UTF-8, UTF-16 is not ASCII compatible, which means that everything needs to be in UTF-16 from the start.</li> <li>Older clients don't support UTF-16. For example, anything on Windows 9x. Possibly Mac OS9 as well.</li> <li>Oh, wait, I almost forgot: North America and European copies of Windows XP don't have Asian fonts installed by default.</li> </ol>

<ul> <li>Your bandwidth consumption is likely to nearly double, assuming most of your HTML is ASCII</li> <li>Clients which incorrectly assume UTF-8 (or ASCII) will be confused</li> </ul> Why do you want to change to UTF-16?

What could go wrong in switching HTML encoding from UTF-8 to UTF-16?

5 Answers

I can think of a few things that will go wrong:

You MUST specify that it's UTF-16 in the HTTP header. Unlike UTF-8, UTF-16 is not ASCII compatible, which means that everything needs to be in UTF-16 from the start.
Older clients don't support UTF-16. For example, anything on Windows 9x. Possibly Mac OS9 as well.
Oh, wait, I almost forgot: North America and European copies of Windows XP don't have Asian fonts installed by default.

187

answered Oct 01 '22 09:10

Powerlord

Your bandwidth consumption is likely to nearly double, assuming most of your HTML is ASCII
Clients which incorrectly assume UTF-8 (or ASCII) will be confused

Why do you want to change to UTF-16?

answered Oct 02 '22 09:10

Jon Skeet

There is also the byte order which becomes an issue with anything above 8-bit data. UTF encoded files begin with a byte order mark which is used to determine the byte order, or endianness, of that file.

Wikipedia has a quite good explanation of this.

answered Oct 04 '22 09:10

FeatureCreep

As far as I know all modern browsers support UTF-16 encoding. But as others have pointed out, you should declare the encoding explicitly. Not all browsers and platforms will support all unicode characters, but I think this is regardless of which encoding you use.

However, if bandwith is a big issue you should probably consider gzipping the HTML. This will save much more bandwidth than switching encoding.

answered Oct 04 '22 09:10

JacquesB

Very nice article you have held here. Fundamentals states, "When a unique character encoding is required, the character encoding MUST be UTF-8, UTF-16 or UTF-32. US-ASCII is upwards-compatible with UTF-8 (an US-ASCII string is also a UTF-8 string, see [RFC 3629]), and UTF-8 is therefore appropriate if compatibility with US-ASCII is desired." In practice, compatibility with US-ASCII is so useful it's almost a requirement. The W3C wisely explains, "In other situations, such as for APIs, UTF-16 or UTF-32 may be more appropriate. Possible reasons for choosing one of these include efficiency of internal processing and interoperability with other processes."

answered Oct 04 '22 09:10

web marketing melbourne

Related questions
                            
                                Is possible to use $(this) and universal selector (*)?
                            
                                how to dynamically set the active class in bootstrap navbar?
                            
                                Extended borders with CSS
                            
                                asp:TextBox change placeholder via .NET code behind
                            
                                Embed a Blob using PDFObject
                            
                                How To Create SubMenu in Drop Down (HTML/CSS)
                            
                                Failed to load resources error 404 while deploying create-react-app
                            
                                How can I get the reference of a file through the HTML's input tag? (Angular 2)
                            
                                Gmail signature - text-decoration:none
                            
                                How to change CSS root variable in React? [duplicate]
                            
                                How to find which child element is invalid in an HTML5 form
                            
                                Vue.js 2 - Remove initial margin from body tag
                            
                                How do I create a transparent jumbotron with Bootstrap v. 4.0.0?
                            
                                No HTML suggestions in Visual Studio Code
                            
                                How to add a class to a WooCommerce product within the product loop
                            
                                Replace element on hover in ReactJs
                            
                                HTML5 <video> Player Controls in Chrome Three Dots on the Right Open Blank Screen
                            
                                CSS infinite ripple animation
                            
                                How does `:` in `on:click` work, in Svelte?
                            
                                What's a liquid layout?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What could go wrong in switching HTML encoding from UTF-8 to UTF-16?

Tags:

html

encoding

utf-8

utf-16

Newbie

People also ask

5 Answers

Powerlord

Jon Skeet

FeatureCreep

JacquesB

web marketing melbourne

Recent Activity

Donate For Us