Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to move data to solr production instance without re-indexing?

Tags:

solr

We have an offline system where we consume input documents from external sources, transform them and store them in solr, one collection at a time.

There is a production solr instance with a different configuration than the offline solr instance (but with the same version of solr) that the data needs to be moved to once it is ready. This is set to run periodically and everytime there is new incoming data, it will replace the documents of a collection with the same name and schema in the production instance.

Is it in any way possible to do this without having to re-index the collection in the production instance? Is there some sort of back-up and restore mechanism that will allow us to copy the data, index and all, into the production system with minimal downtime?

like image 985
Sanjay Avatar asked Aug 27 '15 07:08

Sanjay


People also ask

How remove indexed data from Solr?

To delete documents from the index of Apache Solr, we need to specify the ID's of the documents to be deleted between the <delete></delete> tags. Here, this XML code is used to delete the documents with ID's 003 and 005. Save this code in a file with the name delete. xml.

How do I reindex files in Solr?

There is no process in Solr for programmatically reindexing data. When we say "reindex", we mean, literally, "index it again". However you got the data into the index the first time, you will run that process again.

Which tool can be used to put content into a Solr instance server?

Solr includes a simple command line tool for POSTing various types of content to a Solr server. The tool is bin/post . The bin/post tool is a Unix shell script; for Windows (non-Cygwin) usage, see the Windows section below.

How can I make Solr index faster?

After you post all your documents, call commit once manually or from SolrJ - it will take a while to commit, but this will be much faster overall. Also after you are done with your bulk import, reduce maxTime and maxDocs , so that any incremental posts you will do to Solr will get committed much sooner.


1 Answers

You can try making backup on one system, and a restore on the other system:

Backup:

http://localhost:8983/solr/your-collection-name/replication?command=backup&location=d:\\solr-backup

Restore:

http://localhost:8983/solr/your-collection-name/replication?command=restore&location=d:\\solr-backup

Change localhost:8983 to your server's name and port (backup on one, restore on the other), your-collection-name to your core-name, d:\\solr-backup is the folder on the server, where the backups will be located in (make sure, you copy the backup-data from one server to the other).

See also the solr wiki.

like image 132
hinneLinks Avatar answered Sep 20 '22 01:09

hinneLinks