I'm parsing some messy HTML code with PHP in which there are some redundant
tags and I would like to clean them up a bit. For instance:
<br>
<br /><br />
<br>
How would I replace something like that with this using preg_replace()?:
<br /><br />
Newlines, spaces, and the differences between <br>
, <br/>
, and <br />
would all have to be accounted for.
Edit: Basically I'd like to replace every instance of three or more successive breaks with just two.
Here is something you can use. The first line finds whenever there is 2 or more <br>
tags (with whitespace between and different types) and replace them with wellformated <br /><br />
.
I also included the second line to clean up the rest of the <br>
tags if you want that too.
function clean($txt)
{
$txt=preg_replace("{(<br[\\s]*(>|\/>)\s*){2,}}i", "<br /><br />", $txt);
$txt=preg_replace("{(<br[\\s]*(>|\/>)\s*)}i", "<br />", $txt);
return $txt;
}
This should work, using minimum specifier:
preg_replace('/(<br[\s]?[\/]?>[\s]*){3,}/', '<br /><br />', $multibreaks);
Should match appalling <br><br /><br/><br>
constructions too.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With