Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A plain JavaScript way to decode HTML entities, works on both browsers and Node

How to decode HTML entities like   ' to its original character?

In browsers we can create a DOM to do the trick (see here) or we can use some libraries like he

In NodeJS we can use some third party lib like html-entities

What if we want to use plain JavaScript to do the job?

There are many similar questions and useful answers in stackoverflow but I can't find a way works both on browsers and Node.js. So I'd like to share my opinion.

I have posted my opinion as an answer below. I hope it can be a helping hand for someone. :)

like image 811
Henry He Avatar asked May 26 '17 06:05

Henry He


People also ask

Which method is used to decode the currently encoded HTML code?

The input string is encoded using the HtmlEncode method. The encoded string obtained is then decoded using the HtmlDecode method.

How do you decode HTML tags?

HTML character decoding is the opposite process of encoding. The encoded characters are converted back to their original form in the decoding process. It decodes a string that contains HTML numeric character references and returns the decoded string. You can also choose to convert HTML code into JavaScript string.

What is HTML encode and decode?

HTML DECODE: HTML Decoding is an opposite of encoding process. in decoding process, the specially encoded characters are converted back to their original form. it decodes a string that contains HTML numeric character references and returns the decoded string.

How do I decrypt HTML code?

Wikipedia has a good expalanation of character encodings and how some characters should be represented in HTML. Load the HTML data to decode from a file, then press the 'Decode' button: Browse: Alternatively, type or paste in the text you want to HTML–decode, then press the 'Decode' button.


1 Answers

There are many similar questions and useful answers in stackoverflow but I can't find a way works both on browsers and Node.js. So I'd like to share my opinion.

For html codes like   < > ' and even Chinese characters.

I suggest to use this function. (Inspired by some other answers)

function decodeEntities(encodedString) {
    var translate_re = /&(nbsp|amp|quot|lt|gt);/g;
    var translate = {
        "nbsp":" ",
        "amp" : "&",
        "quot": "\"",
        "lt"  : "<",
        "gt"  : ">"
    };
    return encodedString.replace(translate_re, function(match, entity) {
        return translate[entity];
    }).replace(/&#(\d+);/gi, function(match, numStr) {
        var num = parseInt(numStr, 10);
        return String.fromCharCode(num);
    });
}

This implement also works in Node.js environment.

decodeEntities("&#21704;&#21704;&nbsp;&#39;&#36825;&#20010;&#39;&amp;&quot;&#37027;&#20010;&quot;&#22909;&#29609;&lt;&gt;") //哈哈 '这个'&"那个"好玩<>

As a new user, I only have 1 reputation :(

I can't make comments or answers to existing posts so that's the only way I can do for now.

Edit 1

I think this answer works even better than mine. Although no one gave him up vote.

like image 143
Henry He Avatar answered Nov 09 '22 00:11

Henry He