Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I perform an AND search in Lucene.net when multiple words are used in a search?

I am playing around with Lucene.net to try and get a handle of how to implement it in my application.

I have the following code

            .....
            // Add 2 documents
            var doc1 = new Document();
            var doc2 = new Document();

            doc1.Add(new Field("id", "doc1", Field.Store.YES, Field.Index.ANALYZED));
            doc1.Add(new Field("content", "This is my first document", Field.Store.YES, Field.Index.ANALYZED));
            doc2.Add(new Field("id", "doc2", Field.Store.YES, Field.Index.ANALYZED));
            doc2.Add(new Field("content", "The big red fox jumped", Field.Store.YES, Field.Index.ANALYZED));

            writer.AddDocument(doc1);
            writer.AddDocument(doc2);

            writer.Optimize();
            writer.Close();

            // Search for doc2
            var parser = new QueryParser(Lucene.Net.Util.Version.LUCENE_29, "content", new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29));
            var query = parser.Parse("big abcdefg test1234");
            var searcher = new IndexSearcher(indexDirectory, true);
            var hits = searcher.Search(query);

            Assert.AreEqual(1, hits.Length());

            var document = hits.Doc(0);

            Assert.AreEqual("doc2", document.Get("id"));
            Assert.AreEqual("The big red fox jumped", document.Get("content"));

This test passes, which dismays me a bit. I assume this means that Lucene.Net uses OR for searches between terms and not an AND, but I can't find any information on how to actually perform an AND search.

The end result I am going for is if someone searches for "Matthew Anderson" I don't want it to bring up documents that refer to "Matthew Doe" , as that isn't relevant in any way, shape or form.

like image 783
KallDrexx Avatar asked May 06 '11 22:05

KallDrexx


2 Answers

A. If you require all words to be in a document but don't require the words to be consecutive and in the order you specify: The query

+big +red

matches

* the big red fox jumped
* the red big fox jumped
* the big fast red fox jumped

but does not match

* the small red fox jumped

B. If you want to match a phrase (i.e. all words required; the words have to be consecutive and in the order specified) instead: The query

+"big red"

matches

* the big red fox jumped

but does not match

* the red big fox jumped
* the big fast red fox jumped
* the small red fox jumped
like image 163
Kai Chan Avatar answered Sep 28 '22 13:09

Kai Chan


What do you get when your query is var query = parser.Parse("+big +abcdefg +test1234"); That should cause the parser to require all terms to be present in matching documents. Another possibility is to construct the query programmatically.

BooleanQuery query = new BooleanQuery();
query.add(new BooleanClause(new TermQuery(new Term("field", "big"))), Occur.MUST);
query.add(new BooleanClause(new TermQuery(new Term("field", "abcdefg"))), Occur.MUST);
query.add(new BooleanClause(new TermQuery(new Term("field", "test1234"))), Occur.MUST);
like image 32
Gene Golovchinsky Avatar answered Sep 28 '22 12:09

Gene Golovchinsky