Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to detect a date in a Lucene free text search query?

Tags:

lucene

We're using Lucene to develop a free text search box for data delivered to a user, as in the case of an email Inbox. We'd like to allow for the box to handle dates, for instance 5/1/2011. To make things easier, we are limiting the current version of the feature to just two date formats:

mm/dd/yy
mm/dd/yyyy

For our prototype we hacked the query analysis process to attempt to pre-process the query string to look for these two date patterns. This was about 2 years ago, and we were on Lucene 2.4. Im curious to see if there are any tools in Lucene out-of-the-box to accept a DateFormat and return a TokenStream with any identified dates. Looking through the javadocs for Lucene 2.9, I found the class:

org.apache.lucene.analysis.sinks.DateRecognizerSinkFilter

which seems to do what I need, but it implements a SinkFilter, a concept which doesn't seem to be documented in the Lucene Wiki. Has anyone used this filter before, and if so, what is the most effective way to use it?

like image 439
Peter Bratton Avatar asked Nov 13 '22 21:11

Peter Bratton


1 Answers

There is a bit of sample code (which is, admittedly, over-complicated) in the documentation for TeeSinkTokenFilter. Note that the way the DateRecognizerSinkFilter is designed, it does not store the actual date; it just detects that a token is a date that conforms to the specified format. What I would try is to re-implement the DateRecognizerSinkFilter class to take an array of DateFormat instances, create a new Attribute class called DateAttribute (or some-such) and use the date recognizer subclass to set the parsed date into the DateAttribute if one of its formats matches. That way, you can always test whether you have a valid date by interrogating the DateAttribute, and localize the date formats to one class. Another advantage is that you won't have to handle multiple sinks, thereby simplifying the code from the linked example.

like image 143
Gene Golovchinsky Avatar answered Jun 13 '23 14:06

Gene Golovchinsky