Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Adding missing punctuations before a closing HTML tag

My string is an HTML document. I want to add a dot before HTML closing tags when there is not punctuation right before. Punctuation is .,?!: and I want to use preg_replace for that.

<p>Today, not only we have so many breeds that are trained this and that.</p>

<h4><strong>We must add a dot after the closing strong</strong></h4>

<p>Hunting with your dog is a blah blah with each other.</p>

<h2>No need to change this one!</h2>

<p>Hunting with your dog is a blah blah with each other.</p>

My function:

$source = 'the above html';
$source = addMissingPunctuation( $source );

echo $source;

function addMissingPunctuation( $input ) {

    $tags = [ 'h1', 'h2', 'h3', 'h4', 'h5', 'h6' ];

    foreach ($tags as $tag) {

        $input = preg_replace(
            "/[^,.;!?](<\/".$tag.">)/mi",
            ".${0}",
            $input
        );

    }

    return $input;
}

I tried .${0},.$0, .${1},.$1, .\\0,.\\1 but nothing works. At best, it swallows the match but doesn't replace it with anything. The matching part of my pattern seems to work on regex101 and other sites.

Desired result is:

<p>Today, not only we have so many breeds that are trained this and that.</p>

<h4><strong>We must add a dot after the closing strong</strong>.</h4>

<p>Hunting with your dog is a blah blah with each other.</p>

<h2>No need to change this one!</h2>

<p>Hunting with your dog is a blah blah with each other.</p>
like image 870
Lazhar Avatar asked May 15 '26 05:05

Lazhar


1 Answers

You don't need to iterate over the $tags like that, I'd either do an implode with |, or in this case just right a rule for all possible elements.

$source = '<p>Today, not only we have so many breeds that are trained this and that.</p>

<h4><strong>We must add a dot after the closing strong</strong></h4>

<p>Hunting with your dog is a blah blah with each other.</p>

<h2>No need to change this one!</h2>

<p>Hunting with your dog is a blah blah with each other.</p>';
$source = addMissingPunctuation( $source );
echo $source;
function addMissingPunctuation( $input ) {
    return preg_replace("/[^,.;!?]\K<\/h[1-6]>/mi", ".$0", $input);
}

Demo: https://3v4l.org/6dNV7

You also need to ignore what ever character was before the element, the \K does that. The ${} is for a PHP variable the $0 is the capture group, might be clearer if you write it with \0 in the future.

Regex demo: https://regex101.com/r/xUvvuf/1/

(Example using the \0. https://3v4l.org/jGZal)

Another approach you could take is skipping all the elements with the punctuation, this cuts the steps down a bit.

https://regex101.com/r/xUvvuf/2/

[,.;!?]<\/h[1-6]>(*SKIP)(*FAIL)|<\/h[1-6]>

You also could change the delimiter; this is more personal preference though. If you don't mind escaping the /s you can keep doing that, if not just swap the leading and closing / with a ~.

Demo: https://regex101.com/r/xUvvuf/3/

preg_replace("~[^,.;!?]\K</h[1-6]>~mi"
like image 53
chris85 Avatar answered May 18 '26 18:05

chris85