I have raw html with some css classes inside for various tags.
Example:
Input:
<p class="opener" itemprop="description">Lorem ipsum dolor sit amet, consectetur adipisicing elit. Neque molestias natus iste labore a accusamus dolorum vel.</p>
and I would like to get just plain html like:
Output:
<p>Lorem ipsum dolor sit amet, consectetur adipisicing elit. Neque molestias natus iste labore a accusamus dolorum vel.</p>
I do not know names of these classes. I need to do this in JavaScript (node.js).
Any idea?
This can be done with Cheerio, as I noted in the comments.
To remove all attributes on all elements, you'd do:
var html = '<p class="opener" itemprop="description">Lorem ipsum dolor sit amet, consectetur adipisicing elit. Neque molestias natus iste labore a accusamus dolorum vel.</p>';
var $ = cheerio.load(html); // load the HTML
$('*').each(function() { // iterate over all elements
this.attribs = {}; // remove all attributes
});
var html = $.html(); // get the HTML back
I would create a new element, using the tag name and the innerHTML
of that element. You can then replace the old element with the new one, or do whatever you like with the newEl
as in the code below:
// Get the current element
var el = document.getElementsByTagName('p')[0];
// Create a new element (in this case, a <p> tag)
var newEl = document.createElement(el.nodeName);
// Assign the new element the contents of the old tag
newEl.innerHTML = el.innerHTML;
// Replace the old element with newEl, or do whatever you like with it
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With