Encode HTML entities in JavaScript

People also ask

What is HTML entity encoding?

HTML encoding converts characters that are not allowed in HTML into character-entity equivalents; HTML decoding reverses the encoding. For example, when embedded in a block of text, the characters < and > are encoded as < and > for HTTP transmission.

You can use regex to replace any character in a given unicode range with its html entity equivalent. The code would look something like this:

var encodedStr = rawStr.replace(/[\u00A0-\u9999<>\&]/g, function(i) {
   return '&#'+i.charCodeAt(0)+';';
});

This code will replace all characters in the given range (unicode 00A0 - 9999, as well as ampersand, greater & less than) with their html entity equivalents, which is simply &#nnn; where nnn is the unicode value we get from charCodeAt.

See it in action here: http://jsfiddle.net/E3EqX/13/ (this example uses jQuery for element selectors used in the example. The base code itself, above, does not use jQuery)

Making these conversions does not solve all the problems -- make sure you're using UTF8 character encoding, make sure your database is storing the strings in UTF8. You still may see instances where the characters do not display correctly, depending on system font configuration and other issues out of your control.

Documentation

String.charCodeAt - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/charCodeAt
HTML Character entities - http://www.chucke.com/entities.html

The currently accepted answer has several issues. This post explains them, and offers a more robust solution. The solution suggested in that answer previously had:

var encodedStr = rawStr.replace(/[\u00A0-\u9999<>\&]/gim, function(i) {
  return '&#' + i.charCodeAt(0) + ';';
});

The i flag is redundant since no Unicode symbol in the range from U+00A0 to U+9999 has an uppercase/lowercase variant that is outside of that same range.

The m flag is redundant because ^ or $ are not used in the regular expression.

Why the range U+00A0 to U+9999? It seems arbitrary.

Anyway, for a solution that correctly encodes all except safe & printable ASCII symbols in the input (including astral symbols!), and implements all named character references (not just those in HTML4), use the he library (disclaimer: This library is mine). From its README:

he (for “HTML entities”) is a robust HTML entity encoder/decoder written in JavaScript. It supports all standardized named character references as per HTML, handles ambiguous ampersands and other edge cases just like a browser would, has an extensive test suite, and — contrary to many other JavaScript solutions — he handles astral Unicode symbols just fine. An online demo is available.

Also see this relevant Stack Overflow answer.

I had the same problem and created 2 functions to create entities and translate them back to normal characters. The following methods translate any string to HTML entities and back on String prototype

/**
 * Convert a string to HTML entities
 */
String.prototype.toHtmlEntities = function() {
    return this.replace(/./gm, function(s) {
        // return "&#" + s.charCodeAt(0) + ";";
        return (s.match(/[a-z0-9\s]+/i)) ? s : "&#" + s.charCodeAt(0) + ";";
    });
};

/**
 * Create string from HTML entities
 */
String.fromHtmlEntities = function(string) {
    return (string+"").replace(/&#\d+;/gm,function(s) {
        return String.fromCharCode(s.match(/\d+/gm)[0]);
    })
};

You can then use it as following:

var str = "Test´†®¥¨©˙∫ø…ˆƒ∆÷∑™ƒ∆æøπ£¨ ƒ™en tést".toHtmlEntities();
console.log("Entities:", str);
console.log("String:", String.fromHtmlEntities(str));

Output in console:

Entities: &#68;&#105;&#116;&#32;&#105;&#115;&#32;&#101;&#180;&#8224;&#174;&#165;&#168;&#169;&#729;&#8747;&#248;&#8230;&#710;&#402;&#8710;&#247;&#8721;&#8482;&#402;&#8710;&#230;&#248;&#960;&#163;&#168;&#160;&#402;&#8482;&#101;&#110;&#32;&#116;&#163;&#101;&#233;&#115;&#116;
String: Dit is e´†®¥¨©˙∫ø…ˆƒ∆÷∑™ƒ∆æøπ£¨ ƒ™en t£eést

This is an answer for people googling how to encode html entities, since it does not really address the question regarding the <sup> wrapper and symbols entities.

For HTML tag entities (&, <, and >), without any library, if you do not need to support IE < 9, you could create a html element and set its content with Node.textContent:

var str = "<this is not a tag>";
var p = document.createElement("p");
p.textContent = str;
var converted = p.innerHTML;

Here is an example: https://jsfiddle.net/1erdhehv/

Related questions
                            
                                How do I work around mutability in moment.js?
                            
                                Node.js ES6 classes with require
                            
                                Rendering React Components from Array of Objects
                            
                                How to remove all click event handlers using jQuery?
                            
                                subtle differences between JavaScript and Lua [closed]
                            
                                Handling optional parameters in javascript
                            
                                Vuex: Access State From Another Module
                            
                                Javascript parseInt() with leading zeros
                            
                                How to check if element has any children in Javascript?
                            
                                Functions that return a function
                            
                                Running V8 Javascript Engine Standalone
                            
                                What is the difference between jQuery version 1, version 2, and version 3? [closed]
                            
                                Is it correct to use JavaScript Array.sort() method for shuffling?
                            
                                Case insensitive regex in JavaScript
                            
                                Detect Android phone via Javascript / jQuery
                            
                                Uploading base64 encoded Image to Amazon S3 via Node.js
                            
                                How do I use WebStorm for Chrome Extension Development?
                            
                                What is ECMAScript?
                            
                                What is the difference between .map, .every, and .forEach?
                            
                                How can I get the sha1 hash of a string in node.js?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Encode HTML entities in JavaScript

Tags:

javascript

html

People also ask

Recent Activity

Donate For Us