Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does Lucene.Net manage multiple threads accessing the same index, one indexing while the other is searching?

When using Lucene.Net with ASP.NET, I can imagine that one web request can trigger an update to the index while another web request is performing a search. Does Lucene.Net have built in it the ability to manage concurrent access, or do I have to manage it, to avoid "being used by another process" errors?

EDIT: After reading docs and experimentation, this is what I think I've learned: There are two issues, thread safety and concurrency. Multithreading is "safe" in that you can't do anything bad to the index. But, it's safe at the cost of just one object having a lock on the index at one time. The second object will come along and throw an exception. So, you can't leave a search open and expect a writer in another thread to be able to update the index. And if a thread is busy updating the index, then trying to create a searcher will fail.

Also, Searchers see the index as it was at the time that they open it, so if you keep them around, and update the index, they won't see the updates.

I wanted my searchers to see the latest updates.

My design, and it seems to be working so far, is that my writers and searchers share a lock, so that they don't fail - they just wait - until the current write or search is done.

like image 850
Corey Trager Avatar asked Oct 11 '08 04:10

Corey Trager


People also ask

How does Lucene index work?

In a nutshell, when lucene indexes a document it breaks it down into a number of terms. It then stores the terms in an index file where each term is associated with the documents that contain it. You could think of it as a bit like a hashtable.

Is Lucene distributed?

- In-Memory (Fast): Distributed Lucene is built on top of NCache that is an In-Memory Distributed Datastore. As a result, Distributed Lucene is also in-memory and therefore very fast. - Lucene Index Partitioned: in order to provide scalability, Lucene index is partitioned across all the servers in the cluster.

Is Lucene thread safe?

NOTE: IndexWriter instances are completely thread safe, meaning multiple threads can call any of its methods, concurrently. If your application requires external synchronization, you should not synchronize on the IndexWriter instance as this may cause deadlock; use your own (non-Lucene) objects instead.

Why is Lucene so fast?

Why is Lucene faster? Lucene is very fast at searching for data because of its inverted index technique. Normally, datasources structure the data as an object or record, which in turn have fields and values.


1 Answers

According to this page,

Indexing and searching are not only thread safe, but process safe. What this means is that:

  • Multiple index searchers can read the lucene index files at the same time.
  • An index writer or reader can edit the lucene index files while searches are ongoing
  • Multiple index writers or readers can try to edit the lucene index files at the same time (it's important for the index writer/reader to be closed so it will release the file lock). However, the query parser is not thread safe, so each thread using the index should have its own query parser.

The index writer however, is thread safe, so you can update the index while people are searching it. However, you then have to make sure that the threads with open index searchers close them and open new ones, to get the newly updated data.

like image 59
Judah Gabriel Himango Avatar answered Oct 15 '22 19:10

Judah Gabriel Himango