Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to strip tags in a safer way than using strip_tags function?

I'm having some problems using strip_tags PHP function when the string contains 'less than' and 'greater than' signs. For example:

If I do:

strip_tags("<span>some text <5ml and then >10ml some text </span>");

I'll get:

some text 10ml some text

But, obviously I want to get:

some text <5ml and then >10ml some text

Yes I know that I could use &lt; and &gt;, but I don't have chance to convert those characters into HTML entities since data is already stored as you can see in my example.

What I'm looking for is a clever way to parse HTML in order to get rid only actual HTML tags.

Since TinyMCE was used for generate that data, I know which actual html tags could be used in any case, so a strip_tags($string, $black_list) implementation would be more usefull than strip_tags($string, $allowable_tags).

Any thoughs?

like image 644
texai Avatar asked Feb 14 '11 18:02

texai


People also ask

How do I strip a tag in HTML?

How do you remove your HTML Code from a given HTML URL? Users can copy and paste HTML code using the view source of the URL, or click on the URL button and enter the URL and click on Strip HTML Button.

Is Strip Tag enough?

It's not even safe in HTML! strip_tags() is not enough to protect values in attributes, e.g., <input value="$foo"> might be exploited with $foo = " onfocus="evil() (no < , > needed!)

How do I strip HTML tags in PHP?

The strip_tags() function strips a string from HTML, XML, and PHP tags. Note: HTML comments are always stripped. This cannot be changed with the allow parameter.

What is the use of Strip_tags () method?

The strip_tags() function is an inbuilt function in PHP which is used to strips a string from HTML, and PHP tags. This function returns a string with all NULL bytes, HTML, and PHP tags stripped from a given $str.


3 Answers

As a wacky workaround you could filter non-html brackets with:

$html = preg_replace("# <(?![/a-z]) | (?<=\s)>(?![a-z]) #exi", "htmlentities('$0')", $html);

Apply strip_tags() afterwards. Note how this only works for your specific example and similar cases. It's a regular expression with some heuristics, not artificial intellegince to discern html tags from unescaped angle brackets with other meaning.

like image 195
mario Avatar answered Sep 26 '22 11:09

mario


If you want to have "greater than" and "lesser than" signs, you need to escape them:

&gt; is >

&lt; is <

See e.g. this: http://www.w3schools.com/html/html_entities.asp

like image 33
Piskvor left the building Avatar answered Sep 23 '22 11:09

Piskvor left the building


Instead of strip_tags(), just use htmlspecialchars() instead.

http://php.net/manual/en/function.htmlspecialchars.php

like image 45
dqhendricks Avatar answered Sep 24 '22 11:09

dqhendricks