Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Solr Incremental backup on real-time system with heavy index

I implement search engine with solr that import minimal 2 million doc per day. User must can search on imported doc ASAP (near real-time).

I using 2 dedicated Windows x64 with tomcat 6 (Solr shard mode). every server, index about 120 million doc and about 220 GB (total 500 GB).

I want to get backup incremental from solr index file during update or search.
after search it, find rsync tools for UNIX and DeltaCopy for windows (GUI rsync for windows). but get error (vanished) during update.

how to solve this problem.

Note1:File copy really slow, when file size very large. therefore i can't use this way.

Note2: Can i prevent corrupt index files during update, if windows crash or hardware reset or any other problem ?

like image 768
Hamid Avatar asked Jun 21 '10 09:06

Hamid


1 Answers

You can take a hot backup (i.e. while writing to the index) using the ReplicationHandler to copy Solr's data directory elsewhere on the local system. Then do whatever you like with that directory. You can launch the backup whenever you want by going to a URL like this:

http://host:8080/solr/replication?command=backup&location=/home/jboss/backup

Obviously you could script that with wget+cron.

More details can be found here:

http://wiki.apache.org/solr/SolrReplication

The Lucene in Action book has a section on hot backups with Lucene, and it appears to me that the code in Solr's ReplicationHandler uses the same strategy as outlined there. One of that book's authors even elaborated on how it works in another StackOverflow answer.

like image 157
Paul A Jungwirth Avatar answered Oct 21 '22 04:10

Paul A Jungwirth