Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

php regular expression to replace "some words" with a link tag, but should exclude "some words" inside link tags

Tags:

regex

php

I have html content stored in a database table. in that html content I want to replace "SOME WORDS" with a link tag. But if "SOME WORDS" is already inside a link tag i should omit them..

e.g.
The content

<p>Lorem ipsum dolor SOME WORDS, consectetur adipiscing elit. <a href="http://example.com">SOME WORDS</a> elementum pharetra velit at cursus. Quisque blandit, nibh at eleifend ullamcorper</p>

The output should be

<p>Lorem ipsum dolor <a href="http://someurl">SOME WORDS</a>, consectetur adipiscing elit. <a href="http://example.com">SOME WORDS</a> elementum pharetra velit at cursus. Quisque blandit, nibh at eleifend ullamcorper</p>

as you can see, it should exclude existing link texts when replacing.

Some guidance to get in to the right track is very much appreciated.

like image 275
EastSw Avatar asked Dec 15 '12 06:12

EastSw


1 Answers

This is how you could solve it using DOMDocument instead of regular expressions:

$contents = <<<EOS
<p>Lorem ipsum dolor SOME WORDS, consectetur adipiscing elit. <a href="http://example.com">SOME WORDS</a> elementum pharetra velit at cursus. Quisque blandit, nibh at eleifend ullamcorper</p>
EOS;

$doc = new DOMDocument;
libxml_use_internal_errors(true);
$doc->loadHTML($contents);
libxml_clear_errors();

$xp = new DOMXPath($doc);

// find all text nodes
foreach ($xp->query('//text()') as $node) {
        // make sure it's not inside an anchor
        if ($node->parentNode->nodeName !== 'a') {
                $node->nodeValue = str_replace(
                    'SOME WORDS', 
                    'SOME OTHER WORDS', 
                    $node->nodeValue
                );
        }
}
// DOMDocument creates a full document and puts your fragment inside a body tag
// So we enumerate the children and save their HTML representation
$body = $doc->getElementsByTagName('body')->item(0);
foreach ($body->childNodes as $node) {
        echo $doc->saveHTML($node);
}
like image 144
Ja͢ck Avatar answered Oct 06 '22 18:10

Ja͢ck