Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multiple Field Query handling in Lucene

Tags:

lucene

I have written an index searcher in Lucene that will search multiple fields in the indexed database.

Actually it takes query as two strings one is say title and another is cityname.

Now the indexed database has three field: title, address and city.

Hit should occur only if the title matches and city name matches. For that purpose I have written the following searcher code using MultiFieldQuerySearcher with the help of a post:

public void searchdb(String myQuery, String myCity) throws Exception
{
    System.out.println("Searching in the database ...");
    String[] fields={"title","address","city"};
    MultiFieldQueryParser parser = new MultiFieldQueryParser(Version.LUCENE_CURRENT, fields, new StandardAnalyzer(Version.LUCENE_CURRENT));
    parser.setDefaultOperator(QueryParser.Operator.AND);
    if(!myQuery.toLowerCase().contains(myCity.toLowerCase()))
    {
        myQuery="title:"+myQuery+" "+"address:"+myQuery+" "+myCity+" "+"city:"+myCity;
    }
    Query query=parser.parse(myQuery);
    if (query instanceof BooleanQuery) 
    {
        BooleanClause.Occur[] flags ={BooleanClause.Occur.MUST,BooleanClause.Occur.SHOULD,BooleanClause.Occur.MUST};
        BooleanQuery booleanQuery = (BooleanQuery) query;
        BooleanClause[] clauses = booleanQuery.getClauses();
        System.out.println("Query="+booleanQuery.toString()+" and Number of clauses="+clauses.length);
        for (int i = 0; i < clauses.length; i++) 
        {
            clauses[i].setOccur(flags[i]);
        }
        Directory dir=FSDirectory.open(new File("demoIndex"));
        IndexSearcher searcher = new IndexSearcher(dir, true);
        TopDocs hits = searcher.search(booleanQuery, 20);
        searcher.close();
        dir.close();
        System.out.println("Number of hits="+hits.totalHits);
    }
}

But it is not running properly.

For example if the query is "Pizza Hut" and city is "Mumbai", I want "Pizza Hut" to be searched only in title field of the database and Mumbai only in city field of the database.

But it is finding "Hut" also in the city field of the database as the output of the statement booleanQuery.toString() is coming as "+title:pizza +(title:hut city:hut) +city:mumbai".

As a result in the for loop it is giving index outOfBound error.

I am new to Lucene. So I am asking for help to fix the problem.

like image 665
Joy Avatar asked Mar 31 '13 10:03

Joy


1 Answers

We use MultiFieldQueryParser only when we want to search the same keyword(s) in multiple fields.

To handle your use case, it is simpler that you already have references to city-keyword and title-keyword separately. Try using following code.

StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
// city query
QueryParser cityQP = new QueryParser(Version.LUCENE_CURRENT, "city", analyzer);
Query cityQuery = cityQP.parse(myCity);

// title query
QueryParser titleQP = new QueryParser(Version.LUCENE_CURRENT, "title", analyzer);
Query titleQuery = titleQP.parse(myQuery);

// final query
BooleanQuery finalQuery = new BooleanQuery();
finalQuery.add(cityQuery, Occur.MUST); // MUST implies that the keyword must occur.
finalQuery.add(titleQuery, Occur.MUST); // Using all "MUST" occurs is equivalent to "AND" operator.
like image 90
phanin Avatar answered Oct 13 '22 00:10

phanin