The strip_tags()
documentation tells us that all the tags except the that in the second parameter are stripped. The operation this function performs is totally opposite to its name. It should have been named strip_all_tags_except()
.
Let's forget about the name and come to what I want to ask. I want the functionality of removing only the tags I mention in the second parameter. ie. I want the following to strip tags <iframe><script><style><embed><object>
and allow all others.
my_strip_tags($data,'<iframe><script><style><embed><object>');
It's pretty opposite to what strip_tags()
does.
How do I make this happen?
Updated 2012-06-23; major security flaw.
Here's a class from another project that should do what you're looking for:
final class Filter {
private function __construct() {}
const SafeTags = 'a abbr acronym address b bdo big blockquote br caption center cite code col colgroup dd del dfn dir div dl dt em font h1 h2 h3 h4 h5 h6 hr i img ins kbd legend li ol p pre q s samp small span strike strong sub sup table tbody td tfoot th thead tr tt u ul var article aside figure footer header nav section rp rt ruby dialog hgroup mark time';
const SafeAttributes = 'href src title alt type rowspan colspan lang';
const URLAttributes = 'href src';
public static function HTML($html) {
# Get array representations of the safe tags and attributes:
$safeTags = explode(' ', self::SafeTags);
$safeAttributes = explode(' ', self::SafeAttributes);
$urlAttributes = explode(' ', self::URLAttributes);
# Parse the HTML into a document object:
$dom = new DOMDocument();
$dom->loadHTML('<div>' . $html . '</div>');
# Loop through all of the nodes:
$stack = new SplStack();
$stack->push($dom->documentElement);
while($stack->count() > 0) {
# Get the next element for processing:
$element = $stack->pop();
# Add all the element's child nodes to the stack:
foreach($element->childNodes as $child) {
if($child instanceof DOMElement) {
$stack->push($child);
}
}
# And now, we do the filtering:
if(!in_array(strtolower($element->nodeName), $safeTags)) {
# It's not a safe tag; unwrap it:
while($element->hasChildNodes()) {
$element->parentNode->insertBefore($element->firstChild, $element);
}
# Finally, delete the offending element:
$element->parentNode->removeChild($element);
} else {
# The tag is safe; now filter its attributes:
for($i = 0; $i < $element->attributes->length; $i++) {
$attribute = $element->attributes->item($i);
$name = strtolower($attribute->name);
if(!in_array($name, $safeAttributes) || (in_array($name, $urlAttributes) && substr($attribute->value, 0, 7) !== 'http://')) {
# Found an unsafe attribute; remove it:
$element->removeAttribute($attribute->name);
$i--;
}
}
}
}
# Finally, return the safe HTML, minus the DOCTYPE, <html> and <body>:
$html = $dom->saveHTML();
$start = strpos($html, '<div>');
$end = strrpos($html, '</div>');
return substr($html, $start + 5, $end - $start - 5);
}
}
It shouldn't happen at all.
strip_tags
is only usable if used without any parameters. Otherwise you will have an XSS in any tag allowed.
As a matter of fact, your concern should be not only tags but also attributes. So, use some sort of HTML purifier instead.
I usually work with htmLawed lib, you can use it to filter, secure & sanitize HTML
http://www.bioinformatics.org/phplabware/internal_utilities/htmLawed/more.htm
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With