Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Retrieving matched context of MySQL fulltext search in PHP (and security)

I'm doing a fulltext search on my MySQL table "pages". I'm displaying a list of pages that match the keyword in their "title" (plain text, VARCHAR, 255) or "content" (html, TEXT). When the match is found in the "content" field, I'd like to display the snippet in which the match was found. I have no idea how to go about this.

Can you put me in the right direction?

$query = '  SELECT 
                *, 
                MATCH(title, content) AGAINST("'.$keyword.'") AS score 
            FROM 
                page 
            WHERE 
                MATCH(title, content) AGAINST("'.$keyword.'")
            ORDER BY 
                score 
            DESC    ';
$result = mysql_query($query) or die (mysql_error());
if(mysql_num_rows($result) > 0) {   
    $output .= '<p>Your keyword matches the following pages:</p>';
    while($row = mysql_fetch_assoc($result)){

        $title      = htmlentities($row['title']);
        $content    = htmlentities(strip_tags($row['content']));
        $content    = limit_text($content, 250); // Cuts it down to 250 characters plus ...

        $output .= '<h2>'.$title.'</h2>';
        if(trim($content) != '') {
            $output .= '<p>'.$content.'</p>'; // I'd like to place a snippet here with the matched context
        }           
    }   
} else {
    $output .= '<p>Keyword not found...</p>';       
}

Also, I have a question regarding security. Right now I'm checking $keyword in three ways:

  • Not blank?
  • More than 2 characters?
  • Not dangerous? (see below)

I use a regular expression to match the following, to see if the user input is dangerous

<script|&lt;script|&gt;script|document.|alert|bcc:|cc:|x-mailer:|to:|recipient|truncate|drop table

This might be a little bit ridiculous and easy to work around, but it is at least a minimal form of protection against XSS exploits. What is the recommended way to secure filter a keyword intended for search? Is PHPIDS overkill?

like image 492
chocolata Avatar asked Mar 04 '11 00:03

chocolata


1 Answers

This should get you started on the "context" part...

// return the part of the content where the keyword was matched
function get_surrounding_text($keyword, $content, $padding)
{
    $position = strpos($content, $keyword);
    // starting at (where keyword was found - padding), retrieve
    // (padding + keyword length + padding) characters from the content
    $snippet = substr($content, $position - $padding, (strlen($keyword) + $padding * 2));
    return '...' . $snippet . '...';
}

$content = 'this is a really long string of characters with a magic word buried somewhere in it';
$keyword = 'magic';
echo get_surrounding_text($keyword, $content, 15); // echoes '... string with a magic word in it...'

This function does not account for cases where the padding boundaries would go outside the content string, like when the keyword is found near the beginning or end of the content. It also doesn't account for multiple matches, etc. But it should hopefully at least point you in the right direction.

like image 153
glomad Avatar answered Nov 14 '22 23:11

glomad