I'm facing long search times (order of 10s of seconds) while searching on a master - shard implementation in a distributed environment. However the same query via Luke returns in milliseconds.
The application is a distributed system. All the nodes share a common NFS mount where the indexes reside. For simplicity lets consider two nodes Node1
and Node2
. The /etc/fstab
entries is as follows.
nfs:/vol/indexes /opt/indexes nfs rw,suid,nodev,rsize=32768,wsize=32768,soft,intr,tcp 0 0
There are multiple feeds (say Feed1
and Feed2
) that hit the system and there is a shard for each of the feed per node and a master for each Feed. The indexes look like
Feed1-master
Feed1-shard-Node1.com
Feed1-shard-Node1.com0
Feed1-shard-Node1.com1
The code that does the search is
FeedIndexManager fim = getManager(feedCode);
searcher = fim.getSearcher();
TopDocs docs = searcher.search(q, filter, start + max, sort);
private FeedIndexManager getManager(String feedCode) throws IOException {
if (!_managers.containsKey(feedCode)) {
synchronized(_managers) {
if (!_managers.containsKey(feedCode)) {
File shard = getShardIndexFile(feedCode);
File master = getMasterIndexFile(feedCode);
_managers.put(feedCode, new FeedIndexManager(shard, master));
}
}
}
return _managers.get(feedCode);
}
The FeedIndexManager is as follows.
public class FeedIndexManager implements Closeable {
private static final Analyzer WRITE_ANALYZER = makeWriterAnalyzer();
private final Directory _master;
private SearcherManager _searcherManager;
private final IndexPair _pair;
private int _numFailedMerges = 0;
private DateTime _lastMergeTime = new DateTime();
public FeedIndexManager(File shard, File master) throws IOException {
_master = NIOFSDirectory.open(master, new SimpleFSLockFactory(master));
IndexWriter writer = null;
try {
writer = new IndexWriter(_master,
WRITE_ANALYZER,
MaxFieldLength.LIMITED);
} finally {
if (null != writer) {
writer.close();
}
writer = null;
}
_searcherManager = new SearcherManager(_master);
_pair = new IndexPair(_master,
shard,
new IndexWriterBuilder(WRITE_ANALYZER));
}
public IndexPair getIndexWriter() {
return _pair;
}
public IndexSearcher getSearcher() {
try {
return _searcherManager.get();
}
catch (IOException ioe) {
throw new DatastoreRuntimeException(
"When trying to get an IndexSearcher for " + _master, ioe);
}
}
public void releaseSearcher(IndexSearcher searcher) {
try {
_searcherManager.release(searcher);
}
catch (IOException ioe) {
throw new DatastoreRuntimeException(
"When trying to release the IndexSearcher " + searcher
+ " for " + _master, ioe);
}
}
/**
* Merges the changes from the shard into the master.
*/
public boolean tryFlush() throws IOException {
LOG.debug("Trying to flush index manager at " + _master
+ " after " + _numFailedMerges + " failed merges.");
if (_pair.tryFlush()) {
LOG.debug("I succesfully flushed " + _master);
_numFailedMerges = 0;
_lastMergeTime = new DateTime();
return true;
}
LOG.warn("I couldn't flush " + _master + " after " + _numFailedMerges
+ " failed merges.");
_numFailedMerges++;
return false;
}
public long getMillisSinceMerge() {
return new DateTime().getMillis() - _lastMergeTime.getMillis();
}
public long getNumFailedMerges() {
return _numFailedMerges;
}
public void close() throws IOException {
_pair.close();
}
/**
* Return the Analyzer used for writing to indexes.
*/
private static Analyzer makeWriterAnalyzer() {
PerFieldAnalyzerWrapper analyzer =
new PerFieldAnalyzerWrapper(new LowerCaseAnalyzer());
analyzer.addAnalyzer(SingleFieldTag.ID.toString(), new KeywordAnalyzer());
// we want tokenizing on the CITY_STATE field
analyzer.addAnalyzer(AddressFieldTag.CITY_STATE.toString(),
new StandardAnalyzer(Version.LUCENE_CURRENT));
return analyzer;
}
}
The killer which consumes about 95-98% of the latency is this call, it takes about 20 seconds for a search, whereas if the index is opened via Luke it is in milliseconds.
TopDocs docs = searcher.search(q, filter, start + max, sort);
I've the following questions
Is it sane to have multiple masters per feed or should I reduce it to only one master? The number of elements in the index is about 50 million.
The latency is low on feeds where the number of entities is less than a million (sub second response). The feeds where the entities are over 2 million takes about 20 seconds. Should I maintain only 1 Shard per node against 1 Shard per node per feed?
The merge from the Shard's to the master is attempted every every 15 seconds. Should this parameter be tweaked?
I'm currently using Lucene 3.1.0 and JDK 1.6. The boxes are two 64-bit cores with 8 GB of RAM. Currently the JVM runs with 4 GB max.
Any suggestion to improve the performance is highly appreciated. I've already carried out all standard performance tuning that is generally prescribed by Lucene. Thanks a lot for reading this lengthy post.
This isn't the answer you were looking for, perhaps, but have a look at Elastic Search. It's a distributed, clustered service layer around Lucene, which is queried over HTTP or can be run embedded.
And it's fast, quite ridiculously so. It seems to have tuned Lucene properly under the covers, while still exposing the full Lucene config options if you need to use them.
Making Lucene perform in a distributed environment is hard, as you're discovering, you end up with nasty locking issues. ElasticSearch is intended to solve that particular problem, so you can solve the other ones.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With