I have a PHP highlighting function which makes certain words bold.
Below is the function, and it works great, except when the array: $words contains a single value that is: b
For example someone searches for: jessie j price tag feat b o b
This will have the following entries in the array $words: jessie,j,price,tag,feat,b,o,b
When a 'b' shows up, my whole function goes wrong, and it displays a whole bunch of wrong html tags. Of course I can strip out any 'b' values from the array, but this isn't ideal, as the highlighting isnt working as it should with certain queries.
This sample script:
function highlightWords2($text, $words)
{
$text = ($text);
foreach ($words as $word)
{
$word = preg_quote($word);
$text = preg_replace("/\b($word)\b/i", '<b>$1</b>', $text);
}
return $text;
}
$string = 'jessie j price tag feat b o b';
$words = array('jessie','tag','b','o','b');
echo highlightWords2($string, $words);
Will output:
<<<b>b</b>><b>b</b></<b>b</b>>>jessie</<<b>b</b>><b>b</b></<b>b</b>>> j price <<<b>b</b>><b>b</b></<b>b</b>>>tag</<<b>b</b>><b>b</b></<b>b</b>>> feat <<b>b</b>><b>b</b></<b>b</b>> <<b>b</b>>o</<b>b</b>> <<b>b</b>><b>b</b></<b>b</b>>
And this only happens because there are "b"'s in the array.
Can you guys see anything that I could change to make it work properly?
You problem is that when your function goes through and looks for all the b's to bold it sees the bold tags and also tries to bold them as well.
@symcbean was close but forgot one thing.
$string = 'jessie j price tag feat b o b';
$words = array('jessie','tag','b','o','b');
print hl($string, $words);
function hl($inp, $words)
{
$replace=array_flip(array_flip($words)); // remove duplicates
$pattern=array();
foreach ($replace as $k=>$fword) {
$pattern[]='/\b(' . $fword . ')(?!>)\b/i';
$replace[$k]='<b>$1</b>';
}
return preg_replace($pattern, $replace, $inp);
}
Do you see this added "(?!>)" that is a negative look ahead assertion, basically it says only match if the string is not followed by a ">" which is what would be seen is opening bold and closing bold tags. Notice I only check for ">" after the string in order to exclude both the opening and closing bold tag as looking for it at the start of the string would not catch the closing bold tag. The above code works exactly as expected.
Your base problem is that you quite wildly replace plain text strings inside HTML. That does cause your problem for small strings as you replace text in tags and attributes as well.
Instead you need to apply your search and replace to the text between HTML texts only. Additionally you don't want to highlight inside another highlight as well.
To do such things, regular expressions are quite limited. Instead use a HTML parser, in PHP this is for example DOMDocument. With a HTML parser it is possible to search only inside the HTML text elements (and not other things like tags, attributes and comments).
You find a highlighter for text in a previous answer of mine with a detailed description how it works. The question is Ignore html tags in preg_replace and it is quite similar to your question so probably this snippet is helpful, it uses <span> instead of <b> tags:
$doc = new DOMDocument;
$doc->loadXML($str);
$xp = new DOMXPath($doc);
$anchor = $doc->getElementsByTagName('body')->item(0);
if (!$anchor)
{
throw new Exception('Anchor element not found.');
}
// search elements that contain the search-text
$r = $xp->query('//*[contains(., "'.$search.'")]/*[FALSE = contains(., "'.$search.'")]/..', $anchor);
if (!$r)
{
throw new Exception('XPath failed.');
}
// process search results
foreach($r as $i => $node)
{
$textNodes = $xp->query('.//child::text()', $node);
// extract $search textnode ranges, create fitting nodes if necessary
$range = new TextRange($textNodes);
$ranges = array();
while(FALSE !== $start = strpos($range, $search))
{
$base = $range->split($start);
$range = $base->split(strlen($search));
$ranges[] = $base;
};
// wrap every each matching textnode
foreach($ranges as $range)
{
foreach($range->getNodes() as $node)
{
$span = $doc->createElement('span');
$span->setAttribute('class', 'search_hightlight');
$node = $node->parentNode->replaceChild($span, $node);
$span->appendChild($node);
}
}
}
If you adopt it for multiple search terms, I would add an additional class with a number depending on the search term so you can nicely style it with CSS in different colors.
Additionally you should remove duplicate search terms and make the xpath expression aware to not look for text that is already part of an element that has the highlight span assigned.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With