Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lucene.Net: How can I add a date filter to my search results?

I've got my searcher working really well, however it does tend to return results that are obsolete. My site is much like NerdDinner whereby events in the past become irrelevant.

I'm currently indexing like this
note: my example is in VB.NET, but I don't care if examples are given in C#

    Public Function AddIndex(ByVal searchableEvent As [Event]) As Boolean Implements ILuceneService.AddIndex

        Dim writer As New IndexWriter(luceneDirectory, New StandardAnalyzer(), False)

        Dim doc As Document = New Document

        doc.Add(New Field("id", searchableEvent.ID, Field.Store.YES, Field.Index.UN_TOKENIZED))
        doc.Add(New Field("fullText", FullTextBuilder(searchableEvent), Field.Store.YES, Field.Index.TOKENIZED))
        doc.Add(New Field("user", If(searchableEvent.User.UserName = Nothing,
                                     "User" & searchableEvent.User.ID,
                                     searchableEvent.User.UserName),
                                 Field.Store.YES,
                                 Field.Index.TOKENIZED))
        doc.Add(New Field("title", searchableEvent.Title, Field.Store.YES, Field.Index.TOKENIZED))
        doc.Add(New Field("location", searchableEvent.Location.Name, Field.Store.YES, Field.Index.TOKENIZED))
        doc.Add(New Field("date", searchableEvent.EventDate, Field.Store.YES, Field.Index.UN_TOKENIZED))

        writer.AddDocument(doc)

        writer.Optimize()
        writer.Close()
        Return True

    End Function

Notice how I have a "date" index that stores the event date.

My search then looks like this

''# code omitted
        Dim reader As IndexReader = IndexReader.Open(luceneDirectory)
        Dim searcher As IndexSearcher = New IndexSearcher(reader)
        Dim parser As QueryParser = New QueryParser("fullText", New StandardAnalyzer())
        Dim query As Query = parser.Parse(q.ToLower)

        ''# We're using 10,000 as the maximum number of results to return
        ''# because I have a feeling that we'll never reach that full amount
        ''# anyways.  And if we do, who in their right mind is going to page
        ''# through all of the results?
        Dim topDocs As TopDocs = searcher.Search(query, Nothing, 10000)
        Dim doc As Document = Nothing

        ''# loop through the topDocs and grab the appropriate 10 results based
        ''# on the submitted page number
        While i <= last AndAlso i < topDocs.totalHits
                doc = searcher.Doc(topDocs.scoreDocs(i).doc)
                IDList.Add(doc.[Get]("id"))
                i += 1
        End While
''# code omitted

I did try the following, but it was to no avail (threw a NullReferenceException).

        While i <= last AndAlso i < topDocs.totalHits
            If Date.Parse(doc.[Get]("date")) >= Date.Today Then
                doc = searcher.Doc(topDocs.scoreDocs(i).doc)
                IDList.Add(doc.[Get]("id"))
                i += 1
            End If
        End While

I also found the following documentation, but I can't make heads or tails of it
http://lucene.apache.org/java/1_4_3/api/org/apache/lucene/search/DateFilter.html

like image 989
Chase Florell Avatar asked Dec 30 '10 18:12

Chase Florell


2 Answers

You're linking to the api documentation of Lucene 1.4.3. Lucene.Net is currently at 2.9.2. I think an upgrade is due.

First, you're using Store.Yes alot. Stored fields will make your index larger, which may be a performance issue. Your date problem can easily be solved by storing dates as strings in the format of "yyyyMMddHHmmssfff" (that's really high resolution, down to milliseconds). You may want to reduce the resolution to create fewer tokens to reduce your index size.

var dateValue = DateTools.DateToString(searchableEvent.EventDate, DateTools.Resolution.MILLISECOND);
doc.Add(new Field("date", dateValue, Field.Store.YES, Field.Index.NOT_ANALYZED));

Then you apply a filter to your search (the second parameter, where you currently pass in Nothing/null).

var dateValue = DateTools.DateToString(DateTime.Now, DateTools.Resolution.MILLISECOND);
var filter = FieldCacheRangeFilter.NewStringRange("date", 
                 lowerVal: dateValue, includeLower: true, 
                 upperVal: null, includeUpper: false);
var topDocs = searcher.Search(query, filter, 10000);

You can do this using a BooleanQuery combining your normal query with a RangeQuery, but that would also affect scoring (which is calculated on the query, not the filter). You may also want to avoid modifying the query for simplicity, so you know what query is executed.

like image 122
sisve Avatar answered Oct 22 '22 12:10

sisve


You can combine multiple queries with a BooleanQuery. Since Lucene only searches text note that the date field in your index must be ordered by the most significant to the least significant part of the date, i.e. in IS8601 format ("2010-11-02T20:49:16.000000+00:00")

Example:

Lucene.Net.Index.Term searchTerm = new Lucene.Net.Index.Term("fullText", searchTerms);
Lucene.Net.Index.Term dateRange = new Lucene.Net.Index.Term("date", "2010*");

Lucene.Net.Search.Query termQuery = new Lucene.Net.Search.TermQuery(searchTerm);
Lucene.Net.Search.Query dateRangeQuery = new Lucene.Net.Search.WildcardQuery(dateRange);

Lucene.Net.Search.BooleanQuery query = new Lucene.Net.Search.BooleanQuery();
query.Add(termQuery, BooleanClause.Occur.MUST);
query.Add(dateRangeQuery, BooleanClause.Occur.MUST);

Alternatively if a wildcard is not precise enough you can add a RangeQuery instead:

Lucene.Net.Search.Query termQuery = new Lucene.Net.Search.TermQuery(searchTerm);
Lucene.Net.Index.Term date1 = new Lucene.Net.Index.Term("date", "2010-11-02*");
Lucene.Net.Index.Term date2 = new Lucene.Net.Index.Term("date", "2010-11-03*");
Lucene.Net.Search.Query dateRangeQuery = new Lucene.Net.Search.RangeQuery(date1, date2, true);

Lucene.Net.Search.BooleanQuery query = new Lucene.Net.Search.BooleanQuery();
query.Add(termQuery, BooleanClause.Occur.MUST);
query.Add(dateRangeQuery, BooleanClause.Occur.MUST);
like image 44
BrokenGlass Avatar answered Oct 22 '22 12:10

BrokenGlass