Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Lucene.Net thread-safe from asp.net web application

So I've been doing some research on the best way to implement Lucene.Net index searching and writing from within a web application. I set out with the following requirements:

  • Need to allow concurrent searching and accessing of the index (queries run in parallel)
  • there will be multiple indexes
  • having an index search be completely up-to-date ("real-time") is NOT a requirement
  • run jobs to update the indexes on some frequency (frequency is different for each index)
  • obviously, would like to do all of this in a way which follows lucene "best practices" and can perform and scale well

I found some helpful resources, and a couple of good questions here on SO like this one

Following that post as guidance, I decided to try a singleton pattern with a concurrent dictionary of a wrapper built to manage an index.

To make things simpler, I'll pretend that I am only managing one index, in which case the wrapper can become the singleton. This ends up looking like this:

public sealed class SingleIndexManager
{
    private const string IndexDirectory = "C:\\IndexDirectory\\";
    private const string IndexName = "test-index";
    private static readonly Version _version = Version.LUCENE_29;

    #region Singleton Behavior
    private static volatile SingleIndexManager _instance;
    private static object syncRoot = new Object();

    public static SingleIndexManager Instance
    {
        get
        {
            if (_instance == null)
            {
                lock (syncRoot)
                {
                    if (_instance == null)
                        _instance = new SingleIndexManager();
                }
            }

            return _instance;
        }
    }
    #endregion

    private IndexWriter _writer;
    private IndexSearcher _searcher;

    private int _activeSearches = 0;
    private int _activeWrites = 0;

    private SingleIndexManager()
    {
        lock(syncRoot)
        {
            _writer = CreateWriter(); //hidden for sake of brevity
            _searcher = new IndexSearcher(_writer.GetReader());
        }
    }

    public List<Document> Search(Func<IndexSearcher,List<Document>> searchMethod)
    {
        lock(syncRoot)
        {
            if(_searcher != null && !_searcher.GetIndexReader().IsCurrent() && _activeSearches == 0)
            {
                _searcher.Close();
                _searcher = null;
            }
            if(_searcher == null)
            {
                _searcher = new IndexSearcher((_writer ?? (_writer = CreateWriter())).GetReader());
            }
        }
        List<Document> results;
        Interlocked.Increment(ref _activeSearches);
        try
        {
            results = searchMethod(_searcher);
        } 
        finally
        {
            Interlocked.Decrement(ref _activeSearches);
        }
        return results;
    }

    public void Write(List<Document> docs)
    {
        lock(syncRoot)
        {
            if(_writer == null)
            {
                _writer = CreateWriter();
            }
        }
        try
        {
            Interlocked.Increment(ref _activeWrites);
            foreach (Document document in docs)
            {
                _writer.AddDocument(document, new StandardAnalyzer(_version));
            }

        } 
        finally
        {
            lock(syncRoot)
            {
                int writers = Interlocked.Decrement(ref _activeWrites);
                if(writers == 0)
                {
                    _writer.Close();
                    _writer = null;
                }
            }
        }
    }
}

Theoretically, this is supposed to allow a thread-safe singleton instance for an Index (here named "index-test") where I have two publicly exposed methods, Search() and Write() which can be called from within an ASP.NET web application with no concerns regarding thread safety? (if this is incorrect, please let me know).

There was one thing which is giving me a little bit of trouble right now:

How do I gracefully close these instances on Application_End in the Global.asax.cs file so that if I want to restart my web application in IIS, I am not going to get a bunch of write.lock failures, etc?

All I can think of so far is:

public void Close()
{
    lock(syncRoot)
    {
        _searcher.Close();
        _searcher.Dispose();
        _searcher = null;

        _writer.Close();
        _writer.Dispose();
        _writer = null;
    }
}

and calling that in Application_End, but if I have any active searchers or writers, is this going to result in a corrupt index?

Any help or suggestions are much appreciated. thanks.

like image 776
Leland Richardson Avatar asked Jul 06 '12 00:07

Leland Richardson


1 Answers

Lucene.NET is very thread safe. I can say for sure that all of the methods on the IndexWriter and IndexReader classes are thread-safe and you can use them without having to worry about synchronization. You can get rid of all of your code that involves synchronizing around instances of these classes.

That said, the bigger problem is using Lucene.NET from ASP.NET. ASP.NET recycles the application pool for a number of reasons, however, while shutting down one application domain, it brings up another one to handle new requests to the site.

If you try to access the same physical files (assuming you are using the file-system based FSDirectory) with a different IndexWriter/IndexReader, then you'll get an error as the lock on the files hasn't been released by the application domain that hasn't been shut down yet.

To that end, the recommended best practice is to control the process that is handling the access to Lucene.NET; this usually means creating a service in which you'd expose your operations via Remoting or WCF (preferably the latter).

It's more work this way (as you'd have to create all of the abstractions to represent your operations), but you gain the following benefits:

  • The service process will always be up, which means that the clients (the ASP.NET application) won't have to worry about contending for the files that FSDirectory requires. They simply have to call the service.

  • You're abstracting your search operations on a higher level. You aren't accessing Lucene.NET directly, but rather, your defining the operations and types that are required for those operations. Once you have that abstracted away, if you decide to move from Lucene.NET to some other search mechanism (say RavenDB), then it's a matter of changing the implementation of the contract.

like image 158
casperOne Avatar answered Oct 25 '22 12:10

casperOne