Was looking to get peoples thoughts on keeping a Lucene index up to date as changes are made to the domain model objects of an application.
The application in question is a Java/J2EE based web app that uses Hibernate. The way I currently have things working is that the Hibernate mapped model objects all implement a common "Indexable" interface that can return a set of key/value pairs that are recorded in Lucene. Whenever a CRUD operation is performed involving such an object I send it via JMS queue into a message driven bean that records in Lucene the primary key of the object and the key/value pairs returned from the index( ) method of the Indexable object that was provided.
My main worries about this scheme is if the MDB gets behind and can't keep up with the indexing operations that are coming in or if some sort of error/exception stops an object from being index. The result is an out-of-date index for either a sort, or long, period of time.
Basically I was just wondering what kind of strategies others had come up with for this sort of thing. Not necessarily looking for one correct answer but am imagining a list of "whiteboard" sort of ideas to get my brain thinking about alternatives.
Change the message: just provide the primary key and the current date, not the key/value pairs. Your mdb fetches the entity by primary key and calls index(). After indexing you set a value "updated" in your index to the message date. You update your index only if the message date is after the "updated" field of the index. This way you can't get behind because you always fetch the current key/value pairs first.
As an alternative: have a look at http://www.compass-project.org.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With