Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find words around search term (snippet) with javascript

I'm returning search results from some json using jlinq and I'd like to show the user a snippet of the result text which contains the search term, say three words before the search term and three words after.

var searchTerm = 'rain'
var text = "I'm singing in the rain, just singing in the rain";

Result would be something like "singing in the rain, just singing in"

How could I do this in javascript? I've seen some suggestions using php, but nothing specifically for javascript.

like image 714
Judson Avatar asked Mar 25 '23 05:03

Judson


2 Answers

Here is a slightly better approximation:

function getMatch(string, term)
{
    index = string.indexOf(term)
    if(index >= 0)
    {
        var _ws = [" ","\t"]

        var whitespace = 0
        var rightLimit = 0
        var leftLimit = 0

        // right trim index
        for(rightLimit = index + term.length; whitespace < 4; rightLimit++)
        {
            if(rightLimit >= string.length){break}
            if(_ws.indexOf(string.charAt(rightLimit)) >= 0){whitespace += 1}
        }

        whitespace = 0
        // left trim index
        for(leftLimit = index; whitespace < 4; leftLimit--)
        {
            if(leftLimit < 0){break}
            if(_ws.indexOf(string.charAt(leftLimit)) >= 0){whitespace += 1}
        }
        return string.substr(leftLimit + 1, rightLimit) // return match
    }
    return // return nothing
}

This is a little bit of "greedy" hehe but it should do the trick. Note the _ws array. You could include all the white space you like or modify to use regex to check for whitespace.

This has been slightly modified to handle phrases. It only finds the first occurrence of the term. Dealing with multiple occurrences would require a slightly different strategy.

It occurred to me that what you want is also possible (in varying degrees) with the following:

function snippet(stringToSearch, phrase)
{
    var regExp = eval("/(\\S+\\s){0,3}\\S*" + phrase + "\\S*(\\s\\S+){0,3}/g")
    // returns an array containing all matches
    return stringToSearch.match(regExp)
}  

The only possible problem with this is, when it grabs the first occurrence of your pattern, it slices off the matched part and then searches again. You also need to be careful that the "phrase" variable doesn't have any regExp characters in it(or convert it to a hex or octal representation)

At any rate, I hope this helps man! :)

like image 160
bool32 Avatar answered Apr 02 '23 10:04

bool32


First, we need to find first occurance of term in a string. Instead, we dealing with an array of words, so we better find first occurance of a term in such an array. I decided to attach this method to Array's prototype. We could use indexOf, but if we split a string by " ", we will deal with words like "rain," and indexOf wouldn't match it.

Array.prototype.firstOccurance = function(term) { 
    for (i in this) { 
        if (this[i].indexOf(term) != -1 ) {  // still can use idnexOf on a string, right? :)
            return parseInt(i,10);  // we need an integer, not a string as i is
        }
    }
}

Than, I split a string by words, to do so, split it by " ":

function getExcerpt(text, searchTerm, precision) {
    var words = text.split(" "),
        index = words.firstOccurance(searchTerm),
        result = [], // resulting array that we will join back
        startIndex, stopIndex;
    // now we need first <precision> words before and after searchTerm
    // we can use slice for this matter
    // but we need to know what is our startIndex and stopIndex
    // since simple substitution from index could lead us to 
    // a negative value
    // and adding to an index could get us to exceeding words array length

    startIndex = index - precision;
    if (startIndex < 0) {
        startIndex = 0;
    }

    stopIndex = index + precision + 1;
    if (stopIndex > words.length) {
        stopIndex = words.length;
    }


    result = result.concat( words.slice(startIndex, index) );
    result = result.concat( words.slice(index, stopIndex) );
    return result.join(' '); // join back
}

Results:

> getExcerpt("I'm singing in the rain, just singing in the rain", 'rain', 3)
'singing in the rain, just singing in'


> getExcerpt("I'm singing in the rain, just singing in the rain", 'rain', 2)
'in the rain, just singing'


> getExcerpt("I'm singing in the rain, just singing in the rain", 'rain', 10)
'I\'m singing in the rain, just singing in the rain'
like image 38
Nemoden Avatar answered Apr 02 '23 10:04

Nemoden