Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best practices on displaying search results with associated text snippets from actual result

Tags:

php

search

mysql

I have a decent, lightweight search engine working for one of my sites using MySQL fulltext indexes and php to parse the results. Work fine but I'd like to offer more 'google-like' results with text snippets from the results and the found words highlighted. Looking for a php based solution. Any recommendations?

like image 520
phirschybar Avatar asked Feb 28 '23 11:02

phirschybar


2 Answers

Searching the actual database is fine until you want to add snazzy features like the one above. In my experience it is best to create a dedicated search table, with keywords and page IDs/URLs/etc. Then populate this table every n hours with content. During this population you can add snippets for each document for each keyword.

Alternatively a quick hack might be:

<?php
$text = 'This is an example text page with content. It could be red, green or blue.';
$keyword = 'red';
$size = 5; // size of snippet either side of keyword

$snippet = '...'.substr($text, strpos($text, $keyword) - $size, strpos($text, $keyword) + sizeof($keyword) + $size).'...';
$snippet = str_replace($keyword, '<strong>'.$keyword.'</strong>', $snippet);
echo $snippet;
?>
like image 112
Al. Avatar answered Apr 27 '23 23:04

Al.


For MySQL, your best bet would be to first split up your query words, clean up your values, and then concatenate everything back into a nice regular expression.

In order to highlight your results, you can use the <strong> tag. Its usage would be semantic as you are putting strong emphasis on an item.

// Done ONCE per page load:
  $search = "Hello World";

  //Remove the quotes and stop words
  $search = str_ireplace(array('"', 'and', 'or'), array('', '', ''), $search);

  // Get the words array
  $words = explode(' ', $search);

  // Clean the array, remove duplicates, etc.
  function remove_empty_values($value) { return trim($value) != ''; }
  function regex_escape(&$value) { $value = preg_quote($value, '/'); }
  $words = array_filter($words, 'remove_empty_values');
  $words = array_unique($words);
  array_walk($words, 'regex_escape');

  $regex = '/(' . implode('|', $words) . ')/gi';

// Done FOR EACH result
  $result = "Something something hello there yes world fun nice";
  $highlighted = preg_replace($regex, '<strong>$0</strong>', $result);

If you are using PostgreSQL, you can simply use the built-in ts_headline as described in the documentation.

like image 26
Andrew Moore Avatar answered Apr 27 '23 22:04

Andrew Moore