I have a decent, lightweight search engine working for one of my sites using MySQL fulltext indexes and php to parse the results. Work fine but I'd like to offer more 'google-like' results with text snippets from the results and the found words highlighted. Looking for a php based solution. Any recommendations?
Searching the actual database is fine until you want to add snazzy features like the one above. In my experience it is best to create a dedicated search table, with keywords and page IDs/URLs/etc. Then populate this table every n hours with content. During this population you can add snippets for each document for each keyword.
Alternatively a quick hack might be:
<?php
$text = 'This is an example text page with content. It could be red, green or blue.';
$keyword = 'red';
$size = 5; // size of snippet either side of keyword
$snippet = '...'.substr($text, strpos($text, $keyword) - $size, strpos($text, $keyword) + sizeof($keyword) + $size).'...';
$snippet = str_replace($keyword, '<strong>'.$keyword.'</strong>', $snippet);
echo $snippet;
?>
For MySQL, your best bet would be to first split up your query words, clean up your values, and then concatenate everything back into a nice regular expression.
In order to highlight your results, you can use the <strong>
tag. Its usage would be semantic as you are putting strong emphasis on an item.
// Done ONCE per page load:
$search = "Hello World";
//Remove the quotes and stop words
$search = str_ireplace(array('"', 'and', 'or'), array('', '', ''), $search);
// Get the words array
$words = explode(' ', $search);
// Clean the array, remove duplicates, etc.
function remove_empty_values($value) { return trim($value) != ''; }
function regex_escape(&$value) { $value = preg_quote($value, '/'); }
$words = array_filter($words, 'remove_empty_values');
$words = array_unique($words);
array_walk($words, 'regex_escape');
$regex = '/(' . implode('|', $words) . ')/gi';
// Done FOR EACH result
$result = "Something something hello there yes world fun nice";
$highlighted = preg_replace($regex, '<strong>$0</strong>', $result);
If you are using PostgreSQL, you can simply use the built-in ts_headline
as described in the documentation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With