Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove content between HTML tags in PHP?

Tags:

html

dom

php

I would like to remove all content (between tags) from a HTML string. Is there an elegant way to do this without writing complex regex?

If you want, I am actually looking for the opposite of what strip_tags() does.

Suggestions?

like image 365
gaekaete Avatar asked Aug 17 '15 18:08

gaekaete


1 Answers

This solution uses regex. I will let you decide if it is complex or not.

$out = preg_replace("/(?<=^|>).*?(?=<|$)/s", "", $in);

Let's break it down:

  • (?<=^|>): A lookbehind. Not actually matched, but it still has to be there. Matches either beginning of string (^) or literal >.
  • .*?: Matches anything (s modifier makes it include newline). The question mark makes it lazy - it matches as few characters as possible.
  • (?=<|$): A lookahead. Matches either literal < or end of string ($).

This is replaced by nothing (""), so that everything between > and < is deleted. A working demo can be seen here. It does not preserve whitespace, so you end up with one super long line.

EDIT: If you know that your input will always be wrapped in HTML-tags you can make it even simpler for yourself, since you don't have to think about the beginning and end of string bits:

$out = preg_replace("/>.*?</s", "><", $in);

This variant will not work for input with text at the beginning or the end - for instance Hello <b>World</b>! will become Hello<b></b>!.

like image 84
Anders Avatar answered Oct 03 '22 15:10

Anders