We have a general question about best practice/programming during a long index rebuild. This question is not "solr specific" could just as well apply to raw Lucene or any other similar indexing tool/library/black box.
The question
What is the best practice for ensuring Solr/Lucene index is "absolutely up to date" after long index rebuild i.e. if, during the course of a 12 hour index rebuild, users have add/change/delete db records or files (PDF's), how do you ensure the rebuild index at the very end “includes” these changes?
Context
Current Approach
Proposed approach
Follow on
Thanks
There are a number of ways to skin this cat.... I am guessing that during the long indexing process of core1 (aka "on deck" core) you are running user queries against an already populated core0 (aka "live" core).
If you can distinguish what has changed, why not just update the live core? If you can run queries against the live core and your filesystem of PDF's to find out which documents have been updated, and which are deleted, just do it all against the live core, and ditch this offline process. This would be the simplest.... Just put the update time of the pdf in your solr document to detect which have changed. If the pdf doesn't exist in solr then add it. Keep a list of solr document ids, and at the end, any that didn't have a matching PDF can be deleted. In the meantime you still have your real time updates coming in.
You could proxy the incoming live updates and multiplex (?) them so they go to both Core1 and Core0. I've built a simple proxy interface and found it very simple. That way all your updates are going to both of your cores and you don't have to do any "reconciliation".
Lastly, you can merge two cores: http://wiki.apache.org/solr/MergingSolrIndexes#Merging_Through_CoreAdmin I don't really know what happens though if you have two documents with the same id, or if a document doesn't exist in one core, but does in the other... I assume it's all an additive process, but you'd want to dig into this.
Love to hear how this goes!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With