We're using a training server to create solr indexes and uploading them to another (solr) server via rsync.
Until now, everything has been fine. Now, our index size on one core has increased drastically and our solr instances are refusing to read those indexes on that core. Also, they are ignoring those indexes without any exceptions. (we sure are reloading the cores or restarting tomcat
after rsyncs
)
ie: in solr stats
, numDocs
is 0
or /select?q=*:*
is not returning any results..
Just to answer the question, are those indexes corrupted, we have regenerated them a couple of times. But nothing has changed. When we try to use smaller indexes, they are being read fine.
our solrconfig.xml in this core is like this; https://gist.github.com/983ebb13c895c9cccbfb
Solr works by gathering, storing and indexing documents from different sources and making them searchable in near real-time. It follows a 3-step process that involves indexing, querying, and finally, ranking the results – all in near real-time, even though it can work with huge volumes of data.
By adding content to an index, we make it searchable by Solr. A Solr index can accept data from many different sources, including XML files, comma-separated value (CSV) files, data extracted from tables in a database, and files in common file formats such as Microsoft Word or PDF.
Apache Solr stores the data it indexes in the local filesystem by default. HDFS (Hadoop Distributed File System) provides several benefits, such as a large scale and distributed storage with redundancy and failover capabilities. Apache Solr supports storing data in HDFS.
Copying your index using rsync is a bad idea. Your Solr server may not have completed writing files to disc when you initiate the copy operation, and you could end up with corruption. The only safe way to do this is to shut down the master (source index), shut down the slave (destination index), remove the entire content of the slave's index directory, copy the master's index across, and then restart everything.
A better approach is what was suggested by Peer Allan above - use Solr's built-in replication support. See http://wiki.apache.org/solr/SolrReplication.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With