Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lucene IndexWriter thread safety

Tags:

java

lucene

Lucene encourages the reuse of an IndexWriter from multiple threads.

Given that two threads might have a reference to the IndexWriter, if thread A calls close on the writer, thread B would be left with a useless writer. But to my understanding lucene somehow knows that another thread uses the same writer and defers its closure.

Is this indeed the case? How does lucene track that another thread uses the writer?

EDIT Judging from the answers it is not correct to close the IndexWriter. But this poses a new issue: If one keeps an IndexWriter open, essentially blocks access to this index from another JVM (eg in case of a cluster, or a shared index between many applications).

like image 607
yannisf Avatar asked May 05 '11 14:05

yannisf


2 Answers

If one thread closes IndexWriter while other threads are still using it, you'll get unpredictable results. We try to have the other threads hit AlreadyClosedException, but this is just best effort (not guaranteed). EG you can easily hit NullPointerException too. So you must synchronize externally to make sure you don't do this.

Recently (only in Lucene's trunk right now, to be 4.0 eventually) a big thread bottleneck inside IndexWriter was fixed, allowing segment flushes to run concurrently (previously they were single threaded). On apps running with many indexing threads on concurrent hardware this can give a big boost in indexing throughput. See http://blog.mikemccandless.com/2011/05/265-indexing-speedup-with-lucenes.html for details.

like image 118
Michael McCandless Avatar answered Nov 06 '22 00:11

Michael McCandless


The threadafety and reuse of IndexWriter means you can have multiple threads all using that instance to create/update/delete documents. If you close indexwriter in one thread though, it will indeed muck everyone else up.

like image 37
MJB Avatar answered Nov 06 '22 00:11

MJB