Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to implement search with multiple filters using lucene.net

Tags:

lucene.net

I'm new to lucene.net. I want to implement search functionality on a client database. I have the following scenario:

  • Users will search for clients based on the currently selected city.
  • If the user wants to search for clients in another city, then he has to change the city and perform the search again.
  • To refine the search results we need to provide filters on Areas (multiple), Pincode, etc. In other words, I need the equivalent lucene queries to the following sql queries:

    SELECT * FROM CLIENTS
         WHERE CITY = N'City1'
         AND (Area like N'%area1%' OR Area like N'%area2%')
    
    SELECT * FROM CILENTS
        WHERE CITY IN ('MUMBAI', 'DELHI')
        AND CLIENTTYPE IN ('GOLD', 'SILVER')
    

Below is the code I've implemented to provide search with city as a filter:

private static IEnumerable<ClientSearchIndexItemDto> _search(string searchQuery, string city, string searchField = "")
{
    // validation
    if (string.IsNullOrEmpty(searchQuery.Replace("*", "").Replace("?", "")))
        return new List<ClientSearchIndexItemDto>();

    // set up Lucene searcher
    using (var searcher = new IndexSearcher(_directory, false))
    {
        var hits_limit = 1000;
        var analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30);

        // search by single field
        if (!string.IsNullOrEmpty(searchField))
        {
            var parser = new QueryParser(Lucene.Net.Util.Version.LUCENE_30, searchField, analyzer);
            var query = parseQuery(searchQuery, parser);
            var hits = searcher.Search(query, hits_limit).ScoreDocs;
            var results = _mapLuceneToDataList(hits, searcher);
            analyzer.Close();
            searcher.Dispose();
            return results;
        }
        else // search by multiple fields (ordered by RELEVANCE)
        {
            var parser = new MultiFieldQueryParser(Lucene.Net.Util.Version.LUCENE_30, new[]
            {
                "ClientId",
                "ClientName",
                "ClientTypeNames",
                "CountryName",
                "StateName",
                "DistrictName",
                "City",
                "Area",
                "Street",
                "Pincode",
                "ContactNumber",
                "DateModified"
            }, analyzer);
            var query = parseQuery(searchQuery, parser);
            var f = new FieldCacheTermsFilter("City",new[] { city });
            var hits = searcher.Search(query, f, hits_limit, Sort.RELEVANCE).ScoreDocs;
            var results = _mapLuceneToDataList(hits, searcher);
            analyzer.Close();
            searcher.Dispose();
            return results;
        }
    }
}

Now I have to provide more filters on Area, Pincode, etc. in which Area is multiple. I tried BooleanQuery like below:

var cityFilter = new TermQuery(new Term("City", city));
var areasFilter = new FieldCacheTermsFilter("Area",areas); -- where type of areas is string[]

BooleanQuery filterQuery = new BooleanQuery();
filterQuery.Add(cityFilter, Occur.MUST);
filterQuery.Add(areasFilter, Occur.MUST); -- here filterQuery.Add not have an overloaded method which accepts string[]

If we perform the same operation with single area then it's working fine.

I've tried with ChainedFilter like below, which doesn't seems to satisfy the requirement. The below code performs or operation on city and areas. But the requirement is to perform OR operation between the areas provided in the given city.

var f = new ChainedFilter(new Filter[] { cityFilter, areasFilter });

Can anybody suggest to me how to achieve this in lucene.net? Your help will be appreciated.

like image 876
MSRS Avatar asked May 26 '14 09:05

MSRS


2 Answers

You're looking for the BooleanFilter. Almost any query object has a matching filter object.

Look into TermsFilter (from Lucene.Net.Contrib.Queries) if your indexing doesn't match the requirements of FieldCacheTermsFilter. From the documentation of the later; "this filter requires that the field contains only a single term for all documents".

var cityFilter = new FieldCacheTermsFilter("CITY", new[] {"MUMBAI", "DELHI"});
var clientTypeFilter = new FieldCacheTermsFilter("CLIENTTYPE", new [] { "GOLD", "SILVER" });

var areaFilter = new TermsFilter();
areaFilter.AddTerm(new Term("Area", "area1"));
areaFilter.AddTerm(new Term("Area", "area2"));

var filter = new BooleanFilter();
filter.Add(new FilterClause(cityFilter, Occur.MUST));
filter.Add(new FilterClause(clientTypeFilter, Occur.MUST));
filter.Add(new FilterClause(areaFilter, Occur.MUST));

IndexSearcher searcher = null; // TODO.
Query query = null; // TODO.
Int32 hits_limit = 0; // TODO.
var hits = searcher.Search(query, filter, hits_limit, Sort.RELEVANCE).ScoreDocs;
like image 76
sisve Avatar answered Jan 02 '23 01:01

sisve


What you are looking for is nested boolean queries so that you have an or (on your cities) but that whole group (matching the or) is itself matched as an and

filter1 AND filter2 AND filter3 AND (filtercity1 OR filtercity2 OR filtercity3)

There is already a good description of how to do this here:

How to create nested boolean query with lucene API (a AND (b OR c))?

like image 34
Ronan Thibaudau Avatar answered Jan 02 '23 02:01

Ronan Thibaudau