Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get more out of Lucene.net

Tags:

lucene.net

I'm trying to incorporate Lucene.net in my web search.

Currently I have a lucene.net index that contains +1 million documents with 7 fields each. The last field is the "all" field that has the content of the previous fields concatenated. Searching the all field is just EXTREMELY fast :)

But I feel there is more to be found here. How can I make a search that searches one or more space separated strings over all the fields without using the "all" field?
I want to be able to give weights to certain fields. Furthermore it would be really nice if the search contained information on WHERE the hit took place so I can show it in the result.

I think this is all possible, but I don't immideatelly see how.
Any help?

like image 770
Boris Callens Avatar asked Feb 10 '09 13:02

Boris Callens


People also ask

Why Lucene is so fast?

Why is Lucene faster? Lucene is very fast at searching for data because of its inverted index technique. Normally, datasources structure the data as an object or record, which in turn have fields and values.

Is Lucene still used?

First written in 1999 by Doug Cutting, still going strong... Apache Lucene, the full-text search library, has operated and been maintained for more than 20 years and for many developers is an integral part of their website and application builds.

How do I update my Lucene index?

Step 1 − IndexWriter class acts as a core component which creates/updates indexes during the indexing process. Step 2 − Create object of IndexWriter. Step 3 − Create a Lucene directory which should point to location where indexes are to be stored.


1 Answers

We do something similar, the trick is to specify fields in your query string:

(+Tier1:ribbon^1)^4 OR (+Tier2:ribbon^1)^4 OR (+Tier3:ribbon^1) OR (+Tier4:q*ribbon*^1)^12

In the above example, the user searched for "ribbon" in our application. We have different segments of data in different fields, and the final field "Tier4" contains all the previous terms concatenated together. We prepend the field with a "q", so we can do leading wild-cards, also:

(+Tier4:q*ribbon*^1)^12

Lastly, we use boosts with the caret (^). This ends up weighting things differently. It took a while to get boosts right, and I'm still not 100% happy with them, but they do make a big impact.

like image 53
Bob King Avatar answered Sep 24 '22 21:09

Bob King