Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lucene.NET 4.8 search not returning results

After an upgrade from Lucene 3.X to 4.8, a couple things had to be rewritten to make everything function again.

I've tried multiple complete solutions (adjusted for our situation) from different tutorials, and many different tweaks and tests, but am unable to find what the actual problem is with the code below.

Starting off with the code

The code for adding the fields to a document now looks like this, after changing the fields from generic types to the specific String type

Document document = new Document
{
    new StringField("productName", product.Name, Field.Store.YES),
    new StringField("productDescription", product.Description, Field.Store.YES),
    new StringField("productCategory", product.Category, Field.Store.YES)
};

The search part of the code looks like this:

Analyzer analyzer = new StandardAnalyzer(Version);
IndexReader reader = DirectoryReader.Open(indexDirectory);
IndexSearcher searcher = new IndexSearcher(reader);
MultiFieldQueryParser parser = new MultiFieldQueryParser(Version,
    new[] { "productName", "productCategory", "productDescription" },
    analyzer,
    new Dictionary<string, float> {
        { "productName", 20 },
        { "productCategory", 5 },
        { "productDescription", 1 }
    }
); 

ScoreDoc[] hits = searcher.Search(parser.Parse(searchTerm))?.ScoreDocs;

The problem

When searching with only a wildcard character the search correctly returns everything, so the indexing part seems to work fine. If I however try to find the following product with any search term, nothing is found at all.

Example product information

  • Name: Tafelrok
  • Description: Tafelrok
  • Category: Tafels & Stoelen

I've tried with 'Tafelrok', 'tafelrok', 'Tafel', 'tafel', 'afel', 'afe' etc. The last term should hit all 3 fields partially, while the first is a complete match against multiple fields.

I've also tried changing the parser.Parse(searchTerm) bit to include wildcards ("" + searchTerm + ""), but nothing changes.

I'm clearly missing something here, any ideas why the search is broken?

like image 734
Kevin Sijbers Avatar asked Feb 20 '18 18:02

Kevin Sijbers


1 Answers

You need to configure your fields appropriately, choose right analyzers for indexing and searching and use correct query syntax.

Document StringField instances are sort of keywords, they are not analyzed, they indexed as is (in it's original case). But StandardAnalyzer applies lower case filter to a query. You can fix this by using KeywordAnalyzer with your query parser. In case when field need to be analyzed (description of the product for example) you can use TextField. Finally, in order to match partial terms you need to use wildcards (* or ?).

For more information check:

  • Analyzers in Lucene.Net article by Aaron Powell
  • Separate IndexableFieldType from Field instances section of Apache Lucene Migration Guide
  • Lucene field, StringField vs TextField
  • Solr Text field and String field - different search behaviour discussion on StackOverflow
  • Apache Lucene - Query Parser Syntax documentation page
  • Document search on partial words discussion on StackOverflow
like image 175
Leonid Vasilev Avatar answered Oct 17 '22 21:10

Leonid Vasilev