Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lucene.net multi field searches

In an attempt to get some more contextually relevant search results I've decided to have a play with lucene.net although I'm very new to it and I've found it not to be the most intuitive library I've come across. This isn't helped by the lack of relevant examples out there to help me figure it out.

I'm using simple lucene to build my index and that seems to be working perfectly:

Field f = null;
Document document = new Document();

document.Add(new Field("id", dl.Id.ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED));

f = new Field("category", dl.CategoryName.ToLowerInvariant(), Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS);
f.SetBoost(5);
document.Add(f);

f = new Field("company_name", dl.CompanyName.ToLowerInvariant(), Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS);
f.SetBoost(2);
document.Add(f);

document.Add(new Field("description", dl.Description.ToLowerInvariant(), Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
document.Add(new Field("meta_keywords", dl.Meta_Keywords.ToLowerInvariant(), Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
document.Add(new Field("meta_description", dl.Meta_Description.ToLowerInvariant(), Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));

//And a few more fields

Based on this index I first tried a query along these lines:

var whatParser = new MultiFieldQueryParser(
    global::Lucene.Net.Util.Version.LUCENE_29,
    new string[] { "company_name", "description", "meta_keywords", "meta_description", "category" },
    analyzer);

whatQuery = whatParser.Parse("search".ToLowerInvariant());

This worked great up until the search term became more than 1 word. Next up was a phrase query.

whatQuery = new PhraseQuery();
whatQuery.Add(new Term("company_name", what));
whatQuery.Add(new Term("description", what));
whatQuery.Add(new Term("meta_keywords", what));
whatQuery.Add(new Term("meta_description", what));
whatQuery.Add(new Term("category", what));

Which i then found threw the error: All phrase terms must be in the same field

So, where am I going wrong? Do you have any suggestions on how to fix it? I'm open to changing the search technology entirely if there are better suggestions out there.

Some additional information which may be useful

  • All results are sorted in the end via new Sort(new SortField[] {new SortField("is_featured", SortField.STRING, true),SortField.FIELD_SCORE})
  • There are some additional search criteria so each query is added to a Boolean query with occur set to SHOULD

Thanks for your help.

like image 934
Hawxby Avatar asked Feb 24 '11 00:02

Hawxby


1 Answers

I think the BooleanClause.Occur.SHOULD is the issue. We use it like this:

string[] fieldList = { "field1", "field2", "field3"; 

//for us the field list varies .. there are other ways to create this array of course
List<BooleanClause.Occur> occurs = new List<BooleanClause.Occur>();
foreach (string field in fieldList)
    occurs.Add(BooleanClause.Occur.SHOULD);

if(!string.IsNullOrEmpty(multiWordPhrase))
{
    Query q = MultiFieldQueryParser.Parse(multiWordPhrase, fieldList, occurs.ToArray(), new StandardAnalyzer());
    return q;
}
like image 136
misteraidan Avatar answered Nov 14 '22 19:11

misteraidan