Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lucene.NET stemming problem

I'm running into a problem using the SnowBallAnalyzer in Lucene.NET. It works great for some words, but others it doesn't find any results on at all, and I'm not sure how to dig into this further to find out what is happening. I am testing the search on the USDA Food Description file which can be found here (http://www.ars.usda.gov/SP2UserFiles/Place/12354500/Data/SR23/asc/FOOD_DES.txt). I'm using the English stemming algorithm. I get the following results when searching for "eggs":

Bagels, egg
Bread, egg
Egg, whole, raw, fresh
Egg, white, raw, fresh
Egg, yolk, raw, fresh
Egg, yolk, raw, frozen
Egg, whole, cooked, fried
...

Those results are great. However I get no results at all when searching for "apple". When I use the StandardAnalyzer, and search for "apple" I get the following results.

Croissants, apple
Strudel, apple,
Babyfood, juice, apple
Babyfood, apple-banana juice
...

Not the best results, but at least it's showing something. Anyone know why the stemming analyzer would be filtering in such a way that I would not get any results?

Edit: Here is my prototype code that I'm working with.

static string[] Search(string searchTerm)
{
    //Lucene.Net.Analysis.Analyzer analyzer = new Lucene.Net.Analysis.Snowball.SnowballAnalyzer("English");
    Lucene.Net.Analysis.Analyzer analyzer = new Lucene.Net.Analysis.Standard.StandardAnalyzer();
    Lucene.Net.QueryParsers.QueryParser parser = new Lucene.Net.QueryParsers.QueryParser(Lucene.Net.Util.Version.LUCENE_29, "text", analyzer);
    Lucene.Net.Search.Query query = parser.Parse(searchTerm);

    Lucene.Net.Search.Searcher searcher = new Lucene.Net.Search.IndexSearcher(Lucene.Net.Store.FSDirectory.Open(new DirectoryInfo("./index/")), true);
    var topDocs = searcher.Search(query, null, 10);

    List<string> results = new List<string>();

    foreach(var scoreDoc in topDocs.scoreDocs)
    {
        results.Add(searcher.Doc(scoreDoc.doc).Get("raw"));
    }

    return results.ToArray();
}
like image 865
Timothy Strimple Avatar asked May 31 '11 19:05

Timothy Strimple


1 Answers

Are you sure you used Lucene.Net.Analysis.Snowball.SnowballAnalyzer("English") to write your index ? You have to use the same analyzer to write and query the index.

like image 145
mathieu Avatar answered Oct 11 '22 16:10

mathieu