I am about to write a near-realtime search application with distributed indexes. Now I wonder what is the correct approch to implement search over multiple indexes:
I have read about the MultiSearcher, so one approch would be:
IndexSearcher[] indexSearchers = new IndexSearcher[indexCount];
for (int i = 0; i < indexCount; i++) {
File directory = new File(indexdir, String.valueOf(i));
IndexWriter indexWriter = new IndexWriter(FSDirectory.open(directory), analyzer, IndexWriter.MaxFieldLength.LIMITED);
IndexReader indexReader = indexWriter.getReader();
indexSearchers[i] = new IndexSearcher(indexReader);
}
MultiSearcher searcher = new MultiSearcher(indexSearchers);
But as I see this is also possible:
IndexReader[] indexReader = new IndexReader[indexCount];
for (int i = 0; i < indexCount; i++) {
File directory = new File(indexdir, String.valueOf(i));
IndexWriter indexWriter = new IndexWriter(FSDirectory.open(directory), analyzer, IndexWriter.MaxFieldLength.LIMITED);
indexReader[i] = indexWriter.getReader();
}
IndexSearcher searcher = new IndexSearcher(new MultiReader(indexReader));
Is there any significant difference between these two approches? The second one would be easyer to handle if the reader is out of data, because I could just call MultiReader.reopen() instead of iterating over all IndexReaders, reopening them and than creating new IndexSearchers...
You should use the second option: http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/all/org/apache/lucene/search/MultiSearcher.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With