Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lucene index with multiple fields of the same nature

Each Lucene doc is a recipe, and each of these recipes have ingredients.

Im working towards being able to search the ingredients and give a result that says two ingredients matched out of four. (for example)

So how can I add the ingredients to the doc? In solr I can just create multiple fields of and it would save them all, I might be doing something wrong because its only saving the one ingredient.

Also this would apply to a field like 'tags'.

p.s Im using the Zend Framework for this, if it matters at all.

like image 554
bluedaniel Avatar asked Mar 09 '10 15:03

bluedaniel


1 Answers

Lucene documents support the addition of multiple fields of the same name. i.e. you can repeatedly call:

document.add(new Field("name"), value) 

So were you to do :

# (pseudo-code) 
document1.add(new Field("ingredient"), "vanilla") 
document1.add(new Field("ingredient"), "strawberry") 
index.add(document)

# And then search for
index.search("ingredient", "vanilla" && "strawberry")

You will get back document1. But if you search for:

index.search("ingredient", "vanilla" && "apple")

You will not get back document1.

If you searched for:

index.search("ingredient", "vanilla" || "apple")

You would also get back document1.

If you want to see which ingredients match you can simply save the fields on the document as Stored fields, and then for each matching document retrieve the list of fields and compare them to the user query.

Also note, by default the PositionIncrementGap for fields with the same name that are added to a document is 0.

This means that if you added:

   document1.add(new Field("ingredient"), "chocolate") 
   document1.add(new Field("ingredient"), "orange") 

then it would be treated as if it were a single ingredient called "chocolate orange" which might match on :

index.search("ingredient", "chocolate orange")

You can avoid this set a value for PositionIncrementGap > 1, which will yield:

0 matches for:

index.search("ingredient", "chocolate orange")

and 1 match for:

index.search("ingredient", "chocolate" &&  "orange")
like image 126
Joel Avatar answered Nov 20 '22 03:11

Joel