Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I correctly insert unicode in an HTML title using JavaScript?

I'm seeing some weird behavior when I'm setting the title of an HTML page using JavaScript. If I insert html character references directly into the title the Unicode renders correctly, for instance:

<title>&#21543;&#20986;</title>

But if I attempt to use html characters references via JavaScript, something seems to be converting the & to (& amp ;) (separating them so SO doesn't just turn it back into ampersand) and thus breaking the encoding, causing it to be rendered as the full coded string:

function execTitleChange() {
  document.title = "&#21543;&#20986;";
}

(I should note that this is a little bit of speculation; when I introspect the DOM using Firebug after executing this JavaScript function, that's where I see the & instead of &.)

If I use \u encoded Unicode characters when setting the value from JavaScript then everything works correctly again:

function execTitleChange() {
  document.title = "\u5427\u51fa";
}

The fact that \u encoded characters work kind of makes sense to me since I think that's how JavaScript represents Unicode characters but I'm stumped as to why the behavior would be different when using the html character references.

like image 659
BenG Avatar asked Aug 24 '12 18:08

BenG


People also ask

How do I add a Unicode character in HTML?

You can enter any Unicode character in an HTML file by taking its decimal numeric character reference and adding an ampersand and a hash at the front and a semi-colon at the end, for example &#8212; should display as an em dash (—).

Can I use Unicode in Javascript?

Unicode in Javascript source codeIn Javascript, the identifiers and string literals can be expressed in Unicode via a Unicode escape sequence. The general syntax is \uXXXX , where X denotes four hexadecimal digits. For example, the letter o is denoted as '\u006F' in Unicode.

Does HTML support Unicode?

The Unicode Standard has become a success and is implemented in HTML, XML, Java, JavaScript, E-mail, ASP, PHP, etc. The Unicode standard is also supported in many operating systems and all modern browsers.

Does HTML use Ascii or Unicode?

An HTML document is a sequence of Unicode characters.


1 Answers

JavaScript string constants are parsed by the JavaScript parser. Text inside HTML tags is parsed by the HTML parser. The two languages (and, by extension, their parsers) are different, and in particular they have different ways of representing characters by character code.

Thus, what you've discovered is the way reality actually is :-) Use the \u escape notation in JavaScript, and use HTML entities (&#nnnn;) in HTML/XML.

edit — now the situation can get even more confusing when you're talking about creating/inserting HTML from JavaScript. When you use .innerHTML to update the DOM from JavaScript, then you are basically handing over HTML source code to the HTML parser for interpretation. For that reason, you can use either JavaScript \u escapes or HTML entities, and things will work (excepting painful issues of character encoding mismatches etc).

Finally, note that JavaScript also provides the String.fromCharCode() function to construct strings from numeric character codes.

like image 111
Pointy Avatar answered Sep 17 '22 15:09

Pointy