I have millions of documents in my Solr index. Only a thousand of those documents have field A, whose schema I want to change. The schema changes include changing multiValued from true to false, stored from false to true, and type from text to string, things that require re-index. Re-indexing the thousand documents will take me a few minutes, where-as re-indexing everything will take days.
The re-indexing page on Solr wiki (http://wiki.apache.org/solr/HowToReindex) says "you may need to delete all documents before you begin your indexing process", but doesn't say when you don't.
Can I delete just the thousand documents containing field A and re-index those thousand, or do I need to delete the entire index (all documents) before re-indexing them all?
I've tested the "deleting the few" scenario in a small, sample index; and updates and queries work as expected on the changed field. However, I don't know if I just got lucky and some problems are lurking due to not deleting everything.
Just keep in mind that when you index a document with the same Id, the old document is automatically marked as 'deleted' but not physically deleted from the index. And Term Vector Analysis is applied to all documents (including deleted documents)
If you need to physically clean up deleted documents, you need to perform index 'Optimize', you can do this from solr admin interface.
So If I were in your place, I would not even delete anything. I would just re-index only the few thousands affected documents. Then do optimize later to clean up the index.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With