Logo Questions Linux Laravel Mysql Ubuntu Git Menu

How do I strip all html tags in javascript with exceptions?

I've been beating my head against this reg ex for the longest time now and am hoping someone can help. Basically I have a WYSIWYG field where a user can type formatted text. But of course they will copy and paste form word/web/etc. So I have a JS function catching the input on paste. I got a function that will strip ALL of the formatting on the text which is nice, but I'd like to have it leave tags like p and br so it's not just a big mess.

Any regex ninjas out there? Here is what I have so far and it works. Just need to allow tags.

like image 919
Code Monkey Avatar asked Dec 12 '22 23:12

Code Monkey

1 Answers

The browser already has a perfectly good parsed HTML tree in o.node. Serialising the document content to HTML (using innerHTML), trying to hack it about with regex (which cannot parse HTML reliably), then re-parsing the results back into document content by setting innerHTML... is just a bit perverse really.

Instead, inspect the element and attribute nodes you already have inside o.node, removing the ones you don't want, eg.:

filterNodes(o.node, {p: [], br: [], a: ['href']});

Defined as:

// Remove elements and attributes that do not meet a whitelist lookup of lowercase element
// name to list of lowercase attribute names.
function filterNodes(element, allow) {
    // Recurse into child elements
    Array.fromList(element.childNodes).forEach(function(child) {
        if (child.nodeType===1) {
            filterNodes(child, allow);

            var tag= child.tagName.toLowerCase();
            if (tag in allow) {

                // Remove unwanted attributes
                Array.fromList(child.attributes).forEach(function(attr) {
                    if (allow[tag].indexOf(attr.name.toLowerCase())===-1)

            } else {

                // Replace unwanted elements with their contents
                while (child.firstChild)
                    element.insertBefore(child.firstChild, child);

// ECMAScript Fifth Edition (and JavaScript 1.6) array methods used by `filterNodes`.
// Because not all browsers have these natively yet, bodge in support if missing.
if (!('indexOf' in Array.prototype)) {
    Array.prototype.indexOf= function(find, ix /*opt*/) {
        for (var i= ix || 0, n= this.length; i<n; i++)
            if (i in this && this[i]===find)
                return i;
        return -1;
if (!('forEach' in Array.prototype)) {
    Array.prototype.forEach= function(action, that /*opt*/) {
        for (var i= 0, n= this.length; i<n; i++)
            if (i in this)
                action.call(that, this[i], i, this);

// Utility function used by filterNodes. This is really just `Array.prototype.slice()`
// except that the ECMAScript standard doesn't guarantee we're allowed to call that on
// a host object like a DOM NodeList, boo.
Array.fromList= function(list) {
    var array= new Array(list.length);
    for (var i= 0, n= list.length; i<n; i++)
        array[i]= list[i];
    return array;
like image 94
bobince Avatar answered May 23 '23 01:05
