I've got my searcher working really well, however it does tend to return results that are obsolete. My site is much like NerdDinner whereby events in the past become irrelevant.
I'm currently indexing like this
note: my example is in VB.NET, but I don't care if examples are given in C#
Public Function AddIndex(ByVal searchableEvent As [Event]) As Boolean Implements ILuceneService.AddIndex
Dim writer As New IndexWriter(luceneDirectory, New StandardAnalyzer(), False)
Dim doc As Document = New Document
doc.Add(New Field("id", searchableEvent.ID, Field.Store.YES, Field.Index.UN_TOKENIZED))
doc.Add(New Field("fullText", FullTextBuilder(searchableEvent), Field.Store.YES, Field.Index.TOKENIZED))
doc.Add(New Field("user", If(searchableEvent.User.UserName = Nothing,
"User" & searchableEvent.User.ID,
searchableEvent.User.UserName),
Field.Store.YES,
Field.Index.TOKENIZED))
doc.Add(New Field("title", searchableEvent.Title, Field.Store.YES, Field.Index.TOKENIZED))
doc.Add(New Field("location", searchableEvent.Location.Name, Field.Store.YES, Field.Index.TOKENIZED))
doc.Add(New Field("date", searchableEvent.EventDate, Field.Store.YES, Field.Index.UN_TOKENIZED))
writer.AddDocument(doc)
writer.Optimize()
writer.Close()
Return True
End Function
Notice how I have a "date" index that stores the event date.
My search then looks like this
''# code omitted
Dim reader As IndexReader = IndexReader.Open(luceneDirectory)
Dim searcher As IndexSearcher = New IndexSearcher(reader)
Dim parser As QueryParser = New QueryParser("fullText", New StandardAnalyzer())
Dim query As Query = parser.Parse(q.ToLower)
''# We're using 10,000 as the maximum number of results to return
''# because I have a feeling that we'll never reach that full amount
''# anyways. And if we do, who in their right mind is going to page
''# through all of the results?
Dim topDocs As TopDocs = searcher.Search(query, Nothing, 10000)
Dim doc As Document = Nothing
''# loop through the topDocs and grab the appropriate 10 results based
''# on the submitted page number
While i <= last AndAlso i < topDocs.totalHits
doc = searcher.Doc(topDocs.scoreDocs(i).doc)
IDList.Add(doc.[Get]("id"))
i += 1
End While
''# code omitted
I did try the following, but it was to no avail (threw a NullReferenceException).
While i <= last AndAlso i < topDocs.totalHits
If Date.Parse(doc.[Get]("date")) >= Date.Today Then
doc = searcher.Doc(topDocs.scoreDocs(i).doc)
IDList.Add(doc.[Get]("id"))
i += 1
End If
End While
I also found the following documentation, but I can't make heads or tails of it
http://lucene.apache.org/java/1_4_3/api/org/apache/lucene/search/DateFilter.html
You're linking to the api documentation of Lucene 1.4.3. Lucene.Net is currently at 2.9.2. I think an upgrade is due.
First, you're using Store.Yes alot. Stored fields will make your index larger, which may be a performance issue. Your date problem can easily be solved by storing dates as strings in the format of "yyyyMMddHHmmssfff" (that's really high resolution, down to milliseconds). You may want to reduce the resolution to create fewer tokens to reduce your index size.
var dateValue = DateTools.DateToString(searchableEvent.EventDate, DateTools.Resolution.MILLISECOND);
doc.Add(new Field("date", dateValue, Field.Store.YES, Field.Index.NOT_ANALYZED));
Then you apply a filter to your search (the second parameter, where you currently pass in Nothing/null).
var dateValue = DateTools.DateToString(DateTime.Now, DateTools.Resolution.MILLISECOND);
var filter = FieldCacheRangeFilter.NewStringRange("date",
lowerVal: dateValue, includeLower: true,
upperVal: null, includeUpper: false);
var topDocs = searcher.Search(query, filter, 10000);
You can do this using a BooleanQuery combining your normal query with a RangeQuery, but that would also affect scoring (which is calculated on the query, not the filter). You may also want to avoid modifying the query for simplicity, so you know what query is executed.
You can combine multiple queries with a BooleanQuery
. Since Lucene only searches text note that the date field in your index must be ordered by the most significant to the least significant part of the date, i.e. in IS8601 format ("2010-11-02T20:49:16.000000+00:00")
Example:
Lucene.Net.Index.Term searchTerm = new Lucene.Net.Index.Term("fullText", searchTerms);
Lucene.Net.Index.Term dateRange = new Lucene.Net.Index.Term("date", "2010*");
Lucene.Net.Search.Query termQuery = new Lucene.Net.Search.TermQuery(searchTerm);
Lucene.Net.Search.Query dateRangeQuery = new Lucene.Net.Search.WildcardQuery(dateRange);
Lucene.Net.Search.BooleanQuery query = new Lucene.Net.Search.BooleanQuery();
query.Add(termQuery, BooleanClause.Occur.MUST);
query.Add(dateRangeQuery, BooleanClause.Occur.MUST);
Alternatively if a wildcard is not precise enough you can add a RangeQuery
instead:
Lucene.Net.Search.Query termQuery = new Lucene.Net.Search.TermQuery(searchTerm);
Lucene.Net.Index.Term date1 = new Lucene.Net.Index.Term("date", "2010-11-02*");
Lucene.Net.Index.Term date2 = new Lucene.Net.Index.Term("date", "2010-11-03*");
Lucene.Net.Search.Query dateRangeQuery = new Lucene.Net.Search.RangeQuery(date1, date2, true);
Lucene.Net.Search.BooleanQuery query = new Lucene.Net.Search.BooleanQuery();
query.Add(termQuery, BooleanClause.Occur.MUST);
query.Add(dateRangeQuery, BooleanClause.Occur.MUST);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With