So I've been doing some research on the best way to implement Lucene.Net index searching and writing from within a web application. I set out with the following requirements:
I found some helpful resources, and a couple of good questions here on SO like this one
Following that post as guidance, I decided to try a singleton pattern with a concurrent dictionary of a wrapper built to manage an index.
To make things simpler, I'll pretend that I am only managing one index, in which case the wrapper can become the singleton. This ends up looking like this:
public sealed class SingleIndexManager
{
private const string IndexDirectory = "C:\\IndexDirectory\\";
private const string IndexName = "test-index";
private static readonly Version _version = Version.LUCENE_29;
#region Singleton Behavior
private static volatile SingleIndexManager _instance;
private static object syncRoot = new Object();
public static SingleIndexManager Instance
{
get
{
if (_instance == null)
{
lock (syncRoot)
{
if (_instance == null)
_instance = new SingleIndexManager();
}
}
return _instance;
}
}
#endregion
private IndexWriter _writer;
private IndexSearcher _searcher;
private int _activeSearches = 0;
private int _activeWrites = 0;
private SingleIndexManager()
{
lock(syncRoot)
{
_writer = CreateWriter(); //hidden for sake of brevity
_searcher = new IndexSearcher(_writer.GetReader());
}
}
public List<Document> Search(Func<IndexSearcher,List<Document>> searchMethod)
{
lock(syncRoot)
{
if(_searcher != null && !_searcher.GetIndexReader().IsCurrent() && _activeSearches == 0)
{
_searcher.Close();
_searcher = null;
}
if(_searcher == null)
{
_searcher = new IndexSearcher((_writer ?? (_writer = CreateWriter())).GetReader());
}
}
List<Document> results;
Interlocked.Increment(ref _activeSearches);
try
{
results = searchMethod(_searcher);
}
finally
{
Interlocked.Decrement(ref _activeSearches);
}
return results;
}
public void Write(List<Document> docs)
{
lock(syncRoot)
{
if(_writer == null)
{
_writer = CreateWriter();
}
}
try
{
Interlocked.Increment(ref _activeWrites);
foreach (Document document in docs)
{
_writer.AddDocument(document, new StandardAnalyzer(_version));
}
}
finally
{
lock(syncRoot)
{
int writers = Interlocked.Decrement(ref _activeWrites);
if(writers == 0)
{
_writer.Close();
_writer = null;
}
}
}
}
}
Theoretically, this is supposed to allow a thread-safe singleton instance for an Index (here named "index-test") where I have two publicly exposed methods, Search()
and Write()
which can be called from within an ASP.NET web application with no concerns regarding thread safety? (if this is incorrect, please let me know).
There was one thing which is giving me a little bit of trouble right now:
How do I gracefully close these instances on Application_End
in the Global.asax.cs file so that if I want to restart my web application in IIS, I am not going to get a bunch of write.lock failures, etc?
All I can think of so far is:
public void Close()
{
lock(syncRoot)
{
_searcher.Close();
_searcher.Dispose();
_searcher = null;
_writer.Close();
_writer.Dispose();
_writer = null;
}
}
and calling that in Application_End
, but if I have any active searchers or writers, is this going to result in a corrupt index?
Any help or suggestions are much appreciated. thanks.
Lucene.NET is very thread safe. I can say for sure that all of the methods on the IndexWriter
and IndexReader
classes are thread-safe and you can use them without having to worry about synchronization. You can get rid of all of your code that involves synchronizing around instances of these classes.
That said, the bigger problem is using Lucene.NET from ASP.NET. ASP.NET recycles the application pool for a number of reasons, however, while shutting down one application domain, it brings up another one to handle new requests to the site.
If you try to access the same physical files (assuming you are using the file-system based FSDirectory
) with a different IndexWriter
/IndexReader
, then you'll get an error as the lock on the files hasn't been released by the application domain that hasn't been shut down yet.
To that end, the recommended best practice is to control the process that is handling the access to Lucene.NET; this usually means creating a service in which you'd expose your operations via Remoting or WCF (preferably the latter).
It's more work this way (as you'd have to create all of the abstractions to represent your operations), but you gain the following benefits:
The service process will always be up, which means that the clients (the ASP.NET application) won't have to worry about contending for the files that FSDirectory
requires. They simply have to call the service.
You're abstracting your search operations on a higher level. You aren't accessing Lucene.NET directly, but rather, your defining the operations and types that are required for those operations. Once you have that abstracted away, if you decide to move from Lucene.NET to some other search mechanism (say RavenDB), then it's a matter of changing the implementation of the contract.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With