Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

javascript regex replace html chars

I'm using JavaScript to set the value of an input with text that may contain HTML specific chars such a &   etc. So, I'm trying to find one regex that will match these values and replace them with the appropriate value ("&", " ") respectively, only I can't figure out the regex to do it.

Here's my attempt:

Make an object that contains the matches and reference to the replacement value:

var specialChars = {
  " " : " ",
  "&"  : "&",
  ">"   : ">",
  "&amp;lt;"   : "<"
}

Then, I want to match my string

var stringToMatch = "This string has special chars &amp;amp; and &amp;nbsp;"

I tried something like

stringToMatch.replace(/(&amp;nbsp;|&amp;)/g,specialChars["$1"]);

but it doesn't work. I don't really understand how to capture the special tag and replace it. Any help is greatly appreciated.

like image 680
brad Avatar asked Aug 04 '09 19:08

brad


1 Answers

I think you can use the functions from a question on a slightly different subject (Efficiently replace all accented characters in a string?).

Jason Bunting's answer has some nice ideas + the necessary explanation, here is his solution with some modifications to get you started (if you find this helpful, upvote his original answer as well, as this is his code, essentially).

var replaceHtmlEntites = (function() {
    var translate_re = /&(nbsp|amp|quot|lt|gt);/g,
        translate = {
            'nbsp': String.fromCharCode(160), 
            'amp' : '&', 
            'quot': '"',
            'lt'  : '<', 
            'gt'  : '>'
        },
        translator = function($0, $1) { 
            return translate[$1]; 
        };

    return function(s) {
        return s.replace(translate_re, translator);
    };
})();

callable as

var stringToMatch = "This string has special chars &amp; and &amp;nbsp;";
var stringOutput  = replaceHtmlEntites(stringToMatch);

Numbered entites are even easier, you can replace them much more generically using a little math and String.fromCharCode().


Another, much simpler possibility would be like this (works in any browser)

function replaceHtmlEntites(string) {
    var div = document.createElement("div");
    div.innerHTML = string;
    return div.textContent || div.innerText;
}

replaceHtmlEntites("This string has special chars &lt; &amp; &gt;");
// -> "This string has special chars < & >"
like image 131
Tomalak Avatar answered Nov 15 '22 00:11

Tomalak