I would like to remove all content (between tags) from a HTML string. Is there an elegant way to do this without writing complex regex?
If you want, I am actually looking for the opposite of what strip_tags() does.
Suggestions?
This solution uses regex. I will let you decide if it is complex or not.
$out = preg_replace("/(?<=^|>).*?(?=<|$)/s", "", $in);
Let's break it down:
(?<=^|>): A lookbehind. Not actually matched, but it still has to be there. Matches either beginning of string (^) or literal >..*?: Matches anything (s modifier makes it include newline). The question mark makes it lazy - it matches as few characters as possible.(?=<|$): A lookahead. Matches either literal < or end of string ($).This is replaced by nothing (""), so that everything between > and < is deleted. A working demo can be seen here. It does not preserve whitespace, so you end up with one super long line.
EDIT: If you know that your input will always be wrapped in HTML-tags you can make it even simpler for yourself, since you don't have to think about the beginning and end of string bits:
$out = preg_replace("/>.*?</s", "><", $in);
This variant will not work for input with text at the beginning or the end - for instance Hello <b>World</b>! will become Hello<b></b>!.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With