Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove all attributes from html?

I have raw html with some css classes inside for various tags.

Example:

Input:

<p class="opener" itemprop="description">Lorem ipsum dolor sit amet, consectetur adipisicing elit. Neque molestias natus iste labore a accusamus dolorum vel.</p>

and I would like to get just plain html like:

Output:

<p>Lorem ipsum dolor sit amet, consectetur adipisicing elit. Neque molestias natus iste labore a accusamus dolorum vel.</p>

I do not know names of these classes. I need to do this in JavaScript (node.js).

Any idea?

like image 238
Pavel Binar Avatar asked Jan 08 '14 18:01

Pavel Binar


2 Answers

This can be done with Cheerio, as I noted in the comments.
To remove all attributes on all elements, you'd do:

var html = '<p class="opener" itemprop="description">Lorem ipsum dolor sit amet, consectetur adipisicing elit. Neque molestias natus iste labore a accusamus dolorum vel.</p>';

var $ = cheerio.load(html);   // load the HTML

$('*').each(function() {      // iterate over all elements
    this.attribs = {};     // remove all attributes
});

var html = $.html();          // get the HTML back
like image 156
adeneo Avatar answered Sep 24 '22 21:09

adeneo


I would create a new element, using the tag name and the innerHTML of that element. You can then replace the old element with the new one, or do whatever you like with the newEl as in the code below:

// Get the current element
var el = document.getElementsByTagName('p')[0];

// Create a new element (in this case, a <p> tag)
var newEl = document.createElement(el.nodeName);

// Assign the new element the contents of the old tag
newEl.innerHTML = el.innerHTML;

// Replace the old element with newEl, or do whatever you like with it
like image 26
MattDiamant Avatar answered Sep 23 '22 21:09

MattDiamant