Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replacing words with tag links in PHP

I have a text ($text) and an array of words ($tags). These words in the text should be replaced with links to other pages so they don't break the existing links in the text. In CakePHP there is a method in TextHelper for doing this but it is corrupted and it breaks the existing HTML links in the text. The method suppose to work like this:

$text=Text->highlight($text,$tags,'<a href="/tags/\1">\1</a>',1);

Below there is existing code in CakePHP TextHelper:

function highlight($text, $phrase, $highlighter = '<span class="highlight">\1</span>', $considerHtml = false) {
  if (empty($phrase)) {
    return $text;
  }

  if (is_array($phrase)) {
    $replace = array();
    $with = array();

    foreach ($phrase as $key => $value) {
      $key = $value;
      $value = $highlighter;
      $key = '(' . $key . ')';
      if ($considerHtml) {
        $key = '(?![^<]+>)' . $key . '(?![^<]+>)';
      }
      $replace[] = '|' . $key . '|ix';
      $with[] = empty($value) ? $highlighter : $value;
    }
    return preg_replace($replace, $with, $text);
  } else {
    $phrase = '(' . $phrase . ')';
    if ($considerHtml) {
      $phrase = '(?![^<]+>)' . $phrase . '(?![^<]+>)';
    }

    return preg_replace('|'.$phrase.'|i', $highlighter, $text);
  }
}
like image 508
Amorphous Avatar asked Aug 18 '10 00:08

Amorphous


2 Answers

You can see (and run) this algorithm here:

http://www.exorithm.com/algorithm/view/highlight

It can be made a little better and simpler with a few changes, but it still isn't perfect. Though less efficient, I'd recommend one of Ben Doom's solutions.

like image 111
Mike C Avatar answered Oct 05 '22 18:10

Mike C


Replacing text in HTML is fundamentally different than replacing plain text. To determine whether text is part of an HTML tag requires you to find all the tags in order not to consider them. Regex is not really the tool for this.

I would attempt one of the following solutions:

  • Find the positions of all the words. Working from last to first, determine if each is part of a tag. If not, add the anchor.
  • Split the string into blocks. Each block is either a tag or plain text. Run your replacement(s) on the plain text blocks, and re-assemble.

I think the first one is probably a bit more efficient, but more prone to programmer error, so I'll leave it up to you.

If you want to know why I'm not approaching this problem directly, look at all the questions on the site about regex and HTML, and how regex is not a parser.

like image 22
Ben Doom Avatar answered Oct 05 '22 19:10

Ben Doom