We need to create our index in Solr and it is taking way too long. We have about 800k records and it seems like it is going to take 15 to 20 days at the rate at which it is indexing. We are looking for a one time index for now. Any suggestions?
From my experience indexing big chunks of data might take a while. Index I'm working on have 2m items (size: 10G). Full index takes about 40 hours using DB.
There are some factors that might slowing you down:
I wrote a system to index about 300,000 records and after some performance tests, I configured SOLR to commit every 5 minutes. Look at the solrconfig.xml. There are several directives related to committing changes but you should not be committing after each record update. Either commit after every 100-200 records or commit every 5 minutes. This is especially important during a reindex of all data.
I chose 5 minutes because it is a reasonable setting for ongoing sync as well, since we poll our db for changes every minute. We tell users that it takes 5 minutes or so for changes to flow through to SOLR, and so far everyone is happy with that.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With