Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best way to keep index real time?

I have a Solr/Lucene index file of approximately 700 Gb. The documents that I need to index are being read in real-time, roughly 1000 docs every 30 minutes are submitted and need to be indexed. In my scenario a script is run every 30 mins that indexes the documents that are not yet indexed, since it is a requirement that new documents should be searchable as soon as possible, but this process slow down the searching.

Is this the best way i can index latest documents or there is some other better way!

like image 768
Ahsan Iqbal Avatar asked Oct 25 '10 13:10

Ahsan Iqbal


1 Answers

First, remember that Solr is not a real-time search engine (yet). There is still work to be done.

You can use a master/slave setup, where the indexation are done on the master and the search on the slave. With this, indexation does not affect search performance. After the commit is done on the master, force the slave to fetch the latest index from the master. While the new index is being replicated on the slave, it is still processing queries with the previous index.

Also, check you cache warming settings. Remember that this might slow down the searches if those settings are too aggressive. Also check the queries launched on the new searcher event.

like image 59
Pascal Dimassimo Avatar answered Oct 30 '22 22:10

Pascal Dimassimo