Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove all attributes from html tags

Tags:

php

I have this html code:

<p style="padding:0px;">   <strong style="padding:0;margin:0;">hello</strong> </p> 

How can I remove attributes from all tags? I'd like it to look like this:

<p>   <strong>hello</strong> </p> 
like image 228
Andres SK Avatar asked Jun 11 '10 20:06

Andres SK


People also ask

How do I remove all attributes from an element?

To remove all attributes of elements, we use removeAttributeNode() method.

How do I remove attributes?

The removeAttribute() method removes an attribute, and does not have a return value. The removeAttributeNode() method removes an Attr object, and returns the removed object.

How do you clear a tag in HTML?

Approach: Select the HTML element which need to remove. Use JavaScript remove() and removeChild() method to remove the element from the HTML document.

How do I get all attributes in HTML?

To get all of the attributes of a DOM element:Use the getAttributeNames() method to get an array of the element's attribute names. Use the reduce() method to iterate over the array. On each iteration, add a new key/value pair containing the name and value of the attribute.


2 Answers

Adapted from my answer on a similar question

$text = '<p style="padding:0px;"><strong style="padding:0;margin:0;">hello</strong></p>';  echo preg_replace("/<([a-z][a-z0-9]*)[^>]*?(\/?)>/si",'<$1$2>', $text);  // <p><strong>hello</strong></p> 

The RegExp broken down:

/              # Start Pattern  <             # Match '<' at beginning of tags  (             # Start Capture Group $1 - Tag Name   [a-z]        # Match 'a' through 'z'   [a-z0-9]*    # Match 'a' through 'z' or '0' through '9' zero or more times  )             # End Capture Group  [^>]*?        # Match anything other than '>', Zero or More times, not-greedy (wont eat the /)  (\/?)         # Capture Group $2 - '/' if it is there  >             # Match '>' /is            # End Pattern - Case Insensitive & Multi-line ability 

Add some quoting, and use the replacement text <$1$2> it should strip any text after the tagname until the end of tag /> or just >.

Please Note This isn't necessarily going to work on ALL input, as the Anti-HTML + RegExp will tell you. There are a few fallbacks, most notably <p style=">"> would end up <p>"> and a few other broken issues... I would recommend looking at Zend_Filter_StripTags as a more full proof tags/attributes filter in PHP

like image 56
gnarf Avatar answered Sep 24 '22 01:09

gnarf


Here is how to do it with native DOM:

$dom = new DOMDocument;                 // init new DOMDocument $dom->loadHTML($html);                  // load HTML into it $xpath = new DOMXPath($dom);            // create a new XPath $nodes = $xpath->query('//*[@style]');  // Find elements with a style attribute foreach ($nodes as $node) {              // Iterate over found elements     $node->removeAttribute('style');    // Remove style attribute } echo $dom->saveHTML();                  // output cleaned HTML 

If you want to remove all possible attributes from all possible tags, do

$dom = new DOMDocument; $dom->loadHTML($html); $xpath = new DOMXPath($dom); $nodes = $xpath->query('//@*'); foreach ($nodes as $node) {     $node->parentNode->removeAttribute($node->nodeName); } echo $dom->saveHTML(); 
like image 22
Gordon Avatar answered Sep 23 '22 01:09

Gordon