I would like to remove all content (between tags) from a HTML string. Is there an elegant way to do this without writing complex regex?
If you want, I am actually looking for the opposite of what strip_tags()
does.
Suggestions?
This solution uses regex. I will let you decide if it is complex or not.
$out = preg_replace("/(?<=^|>).*?(?=<|$)/s", "", $in);
Let's break it down:
(?<=^|>)
: A lookbehind. Not actually matched, but it still has to be there. Matches either beginning of string (^
) or literal >
..*?
: Matches anything (s
modifier makes it include newline). The question mark makes it lazy - it matches as few characters as possible.(?=<|$)
: A lookahead. Matches either literal <
or end of string ($
).This is replaced by nothing (""
), so that everything between >
and <
is deleted. A working demo can be seen here. It does not preserve whitespace, so you end up with one super long line.
EDIT: If you know that your input will always be wrapped in HTML-tags you can make it even simpler for yourself, since you don't have to think about the beginning and end of string bits:
$out = preg_replace("/>.*?</s", "><", $in);
This variant will not work for input with text at the beginning or the end - for instance Hello <b>World</b>!
will become Hello<b></b>!
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With