My string is an HTML document. I want to add a dot before HTML closing tags when there is not punctuation right before. Punctuation is .,?!: and I want to use preg_replace for that.
<p>Today, not only we have so many breeds that are trained this and that.</p>
<h4><strong>We must add a dot after the closing strong</strong></h4>
<p>Hunting with your dog is a blah blah with each other.</p>
<h2>No need to change this one!</h2>
<p>Hunting with your dog is a blah blah with each other.</p>
My function:
$source = 'the above html';
$source = addMissingPunctuation( $source );
echo $source;
function addMissingPunctuation( $input ) {
$tags = [ 'h1', 'h2', 'h3', 'h4', 'h5', 'h6' ];
foreach ($tags as $tag) {
$input = preg_replace(
"/[^,.;!?](<\/".$tag.">)/mi",
".${0}",
$input
);
}
return $input;
}
I tried .${0},.$0, .${1},.$1, .\\0,.\\1 but nothing works. At best, it swallows the match but doesn't replace it with anything. The matching part of my pattern seems to work on regex101 and other sites.
Desired result is:
<p>Today, not only we have so many breeds that are trained this and that.</p>
<h4><strong>We must add a dot after the closing strong</strong>.</h4>
<p>Hunting with your dog is a blah blah with each other.</p>
<h2>No need to change this one!</h2>
<p>Hunting with your dog is a blah blah with each other.</p>
You don't need to iterate over the $tags like that, I'd either do an implode with |, or in this case just right a rule for all possible elements.
$source = '<p>Today, not only we have so many breeds that are trained this and that.</p>
<h4><strong>We must add a dot after the closing strong</strong></h4>
<p>Hunting with your dog is a blah blah with each other.</p>
<h2>No need to change this one!</h2>
<p>Hunting with your dog is a blah blah with each other.</p>';
$source = addMissingPunctuation( $source );
echo $source;
function addMissingPunctuation( $input ) {
return preg_replace("/[^,.;!?]\K<\/h[1-6]>/mi", ".$0", $input);
}
Demo: https://3v4l.org/6dNV7
You also need to ignore what ever character was before the element, the \K does that. The ${} is for a PHP variable the $0 is the capture group, might be clearer if you write it with \0 in the future.
Regex demo: https://regex101.com/r/xUvvuf/1/
(Example using the \0. https://3v4l.org/jGZal)
Another approach you could take is skipping all the elements with the punctuation, this cuts the steps down a bit.
https://regex101.com/r/xUvvuf/2/
[,.;!?]<\/h[1-6]>(*SKIP)(*FAIL)|<\/h[1-6]>
You also could change the delimiter; this is more personal preference though. If you don't mind escaping the /s you can keep doing that, if not just swap the leading and closing / with a ~.
Demo: https://regex101.com/r/xUvvuf/3/
preg_replace("~[^,.;!?]\K</h[1-6]>~mi"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With