Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NOT operator doesn't work in query lucene

I use lucene version 3.0.3.0, but some expression that i search, doesn't work properly. for example if i search "!Fiesta OR Astra" on field "Model", "vauxhallAstra" is returned only and "fordFocus" is not returned. my code is below:

var fordFiesta = new Document();

        fordFiesta.Add(new Field("Id", "1", Field.Store.YES, Field.Index.NOT_ANALYZED));

        fordFiesta.Add(new Field("Make", "Ford", Field.Store.YES, Field.Index.ANALYZED));

        fordFiesta.Add(new Field("Model", "Fiesta", Field.Store.YES, Field.Index.ANALYZED));



        var fordFocus = new Document();

        fordFocus.Add(new Field("Id", "2", Field.Store.YES, Field.Index.NOT_ANALYZED));

        fordFocus.Add(new Field("Make", "Ford", Field.Store.YES, Field.Index.ANALYZED));

        fordFocus.Add(new Field("Model", "Focus", Field.Store.YES, Field.Index.ANALYZED));



        var vauxhallAstra = new Document();

        vauxhallAstra.Add(new Field("Id", "3", Field.Store.YES, Field.Index.NOT_ANALYZED));

        vauxhallAstra.Add(new Field("Make", "Vauxhall", Field.Store.YES, Field.Index.ANALYZED));

        vauxhallAstra.Add(new Field("Model", "Astra", Field.Store.YES, Field.Index.ANALYZED));







        Directory directory = FSDirectory.Open(new DirectoryInfo(Environment.CurrentDirectory + "\\LuceneIndex"));

        Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_30);





        var writer = new IndexWriter(directory, analyzer, true, IndexWriter.MaxFieldLength.LIMITED);

        writer.AddDocument(fordFiesta);

        writer.AddDocument(fordFocus);

        writer.AddDocument(vauxhallAstra);


        writer.Optimize();                       

        writer.Close();

        IndexReader indexReader = IndexReader.Open(directory, true);
        Searcher indexSearch = new IndexSearcher(indexReader);

        var queryParser = new QueryParser(Version.LUCENE_30, "Model", analyzer);
        var query = queryParser.Parse("!Fiesta OR Astra");

        Console.WriteLine("Searching for: " + query.ToString());
        TopDocs resultDocs = indexSearch.Search(query, 200);            
        Console.WriteLine("Results Found: " + resultDocs.MaxScore);

        var hits = resultDocs.ScoreDocs;
        foreach (var hit in hits)
        {
            var documentFromSearcher = indexSearch.Doc(hit.Doc);
            Console.WriteLine(documentFromSearcher.Get("Make") + " " + documentFromSearcher.Get("Model"));
        }

        indexSearch.Close();
        directory.Close();

        Console.ReadKey();
like image 214
Iman 1989 Avatar asked Feb 15 '23 17:02

Iman 1989


1 Answers

!Fiesta OR Astra doesn't mean what you think it means. The !Fiesta portion does NOT mean, "get everything except Fiesta", as you might expect, but rather more like "forbid Fiesta". A NOT term in a Lucene query only filters out results, it does not find anything.

The only query you have defined that will actually fetch results is Astra. So everything containing Astra will be found, then anything with Fiesta will be filtered out.

In order to perform the query I believe you are expecting, you would need something like:

Astra OR (*:* !Fiesta)

*:* as a MatchAllDocsQuery. Since you do need to match all the documents to perform this sort of query, it can be expected to perform poorly.


Confusing interpretation of "boolean" logic like this are why I really don't like AND/OR/NOT syntax for Lucene. +/- is much clearer, more powerful, and doesn't introduce the oddball gotchas like this.

This excellent article on the topic clarifies somewhat why you should be thinking in terms of MUST/MUST_NOT/SHOULD, rather than traditional boolean logic.

like image 181
femtoRgon Avatar answered Mar 11 '23 09:03

femtoRgon