Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Choosing a solr/lucene commit strategy

I have 120k db records to commit into a Solr index.

My question is: should I commit after submitting every 10k records, or only commit once after submitting all the 120k records?

Is there any difference between these two options?

like image 815
mlzboy Avatar asked Oct 11 '10 15:10

mlzboy


3 Answers

Use Solr's default auto-commit values, which I believe are quite reasonable. If not, you can adjust them to suit your needs:

<!-- autocommit pending docs if certain criteria are met.  Future versions may expand the available
 criteria -->
<autoCommit>
  <maxDocs>10000</maxDocs> <!-- maximum uncommited docs before autocommit triggered -->
  <maxTime>50000</maxTime> <!-- maximum time (in MS) after adding a doc before an autocommit is triggered -->
</autoCommit>

This means that it will commit when there are more than 10000 docs waiting to be committed, or 50s have passed since a document was added.

like image 67
dogbane Avatar answered Nov 08 '22 17:11

dogbane


According to the Lucene 2.9.3 documentation, commit() allows readers to see the added documents and puts all added/deleted documents on the index in the disk. It is a costly operation.

So if you want to see part of the documents while adding others, or want an assurance that you will not lose an added set of documents larger than 10,000 documents, you need to commit every 10,000 records.

OTOH, If you prefer to save the extra commits time, and are not afraid to lose documents if the machine fails, commit only after all of the documents were added.

like image 29
Yuval F Avatar answered Nov 08 '22 18:11

Yuval F


The recommended way is to use commitWithin instead of <autoCommit>.

If you are using SolrJ, almost all methods have a commitWithin parameter to use this feature.

like image 25
Mohsen Avatar answered Nov 08 '22 18:11

Mohsen