Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimizing a simple search algorithm

I have been playing around a bit with a fairly simple, home-made search engine, and I'm now twiddling with some relevancy sorting code.

It's not very pretty, but I'm not very good when it comes to clever algorithms, so I was hoping I could get some advice :)

Basically, I want each search result to get scoring based on how many words match the search criteria. 3 points per exact word and one point for partial matches

For example, if I search for "winter snow", these would be the results:

  • winter snow => 6 points
  • winter snowing => 4 points
  • winterland snow => 4 points
  • winter sun => 3 points
  • winterland snowing => 2 points

Here's the code:

String[] resultWords = result.split(" ");
String[] searchWords = searchStr.split(" ");
int score = 0;
for (String resultWord : resultWords) {
    for (String searchWord : searchWords) {
        if (resultWord.equalsIgnoreCase(searchWord))
            score += 3;
        else if (resultWord.toLowerCase().contains(searchWord.toLowerCase()))
            score++;
    }
}
like image 335
Ace Avatar asked May 14 '09 10:05

Ace


People also ask

How to optimize search in java?

Basic optimization can be done by preprocessing your database: don't split entries into words every time. Build words list (prefer hash or binary tree to speedup search in the list) for every entry during adding it into DB, remove all too short words, lower case and store this data for further usage.

When would you use linear search?

Linear search is usually very simple to implement, and is practical when the list has only a few elements, or when performing a single search in an un-ordered list. When many values have to be searched in the same list, it often pays to pre-process the list in order to use a faster method.


2 Answers

Your code seems ok to me. I suggest little changes:

Since your are going through all possible combinations you might get the toLowerCase() of your back at the start.

Also, if an exact match already occurred, you don't need to perform another equals.

    result = result.toLowerCase();
    searchStr = searchStr.toLowerCase();

    String[] resultWords = result.split(" ");
    String[] searchWords = searchStr.split(" ");
    int score = 0;
    for (String resultWord : resultWords) {
        boolean exactMatch = false;
        for (String searchWord : searchWords) {
            if (!exactMatch && resultWord.equals(searchWord)) {
                exactMatch = true;
                score += 3;
            } else if (resultWord.contains(searchWord))
                score++;
        }
    }

Of course, this is a very basic level. If you are really interested in this area of computer science and want to learn more about implementing search engines start with these terms:

  • Natural Language Processing
  • Information retrieval
  • Text mining
like image 196
bruno conde Avatar answered Oct 13 '22 00:10

bruno conde


  • stemming
  • for acronyms case sensitivity is important, i.e. SUN; any word that matches both content and case must be weighted more than 3 points (5 or 7)?
  • use the strategy design pattern

For example, consider this naive score model:

interface ScoreModel {
     int startingScore();
     int partialMatch();
     int exactMatch();
}

...

int search(String result, String searchStr, ScoreModel model) {
    String[] resultWords = result.split(" ");
    String[] searchWords = searchStr.split(" ");
    int score = model.startingScore();

    for (String resultWord : resultWords) {
        for (String searchWord : searchWords) {
            if (resultWord.equalsIgnoreCase(searchWord)) {
                score += model.exactMatch();
            } else if (resultWord.toLowerCase().contains(searchWord.toLowerCase())) {
                score += model.partialMatch();
            }
        }
    }

    return score;
}
like image 22
dfa Avatar answered Oct 13 '22 01:10

dfa