I'm doing a fulltext search on my MySQL table "pages". I'm displaying a list of pages that match the keyword in their "title" (plain text, VARCHAR, 255) or "content" (html, TEXT). When the match is found in the "content" field, I'd like to display the snippet in which the match was found. I have no idea how to go about this.
Can you put me in the right direction?
$query = ' SELECT
*,
MATCH(title, content) AGAINST("'.$keyword.'") AS score
FROM
page
WHERE
MATCH(title, content) AGAINST("'.$keyword.'")
ORDER BY
score
DESC ';
$result = mysql_query($query) or die (mysql_error());
if(mysql_num_rows($result) > 0) {
$output .= '<p>Your keyword matches the following pages:</p>';
while($row = mysql_fetch_assoc($result)){
$title = htmlentities($row['title']);
$content = htmlentities(strip_tags($row['content']));
$content = limit_text($content, 250); // Cuts it down to 250 characters plus ...
$output .= '<h2>'.$title.'</h2>';
if(trim($content) != '') {
$output .= '<p>'.$content.'</p>'; // I'd like to place a snippet here with the matched context
}
}
} else {
$output .= '<p>Keyword not found...</p>';
}
Also, I have a question regarding security. Right now I'm checking $keyword
in three ways:
I use a regular expression to match the following, to see if the user input is dangerous
<script|<script|>script|document.|alert|bcc:|cc:|x-mailer:|to:|recipient|truncate|drop table
This might be a little bit ridiculous and easy to work around, but it is at least a minimal form of protection against XSS exploits. What is the recommended way to secure filter a keyword intended for search? Is PHPIDS overkill?
This should get you started on the "context" part...
// return the part of the content where the keyword was matched
function get_surrounding_text($keyword, $content, $padding)
{
$position = strpos($content, $keyword);
// starting at (where keyword was found - padding), retrieve
// (padding + keyword length + padding) characters from the content
$snippet = substr($content, $position - $padding, (strlen($keyword) + $padding * 2));
return '...' . $snippet . '...';
}
$content = 'this is a really long string of characters with a magic word buried somewhere in it';
$keyword = 'magic';
echo get_surrounding_text($keyword, $content, 15); // echoes '... string with a magic word in it...'
This function does not account for cases where the padding boundaries would go outside the content string, like when the keyword is found near the beginning or end of the content. It also doesn't account for multiple matches, etc. But it should hopefully at least point you in the right direction.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With