Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lucene.Net multiple word search on multiple field and wild card use and phrase search, fuzzy search all

i am very new to lucene.net. i index data for multiple field with lucene.net. this way i did the index data

                    Document doc = new Document();
                    doc.Add(new Field("ID", oData.ID.ToString() + "_" + oData.Type, Field.Store.YES, Field.Index.UN_TOKENIZED));
                    doc.Add(new Field("Title", oData.Title, Field.Store.YES, Field.Index.TOKENIZED));
                    doc.Add(new Field("Description", oData.Description, Field.Store.YES, Field.Index.TOKENIZED));
                    doc.Add(new Field("Url", oData.Url, Field.Store.YES, Field.Index.TOKENIZED));
                    writer.AddDocument(doc);

now when user search then user can input data like Audi BMW ECU

1) fist time i want that each word like [Audi] [BMW] [ECU] should be search against for the fields i index like title,description,url . each word should search against 3 field called title,description,url . so what i need to do. what code i need to write.

2) second time the phrase "Audi BMW ECU" should be search against title,description,url fields.

3) user may useinput wild card when they search like Audi BMW ECU* or Audi BMW ECU? 4) i want to add fuzzy search along with multi word search so if user mis-spellings then also result come.

please guide me how could i club up all the logic & functionality in my code and routine as result i got a result of all kind user input.

if possible discuss this issue in detail.

like image 597
Thomas Avatar asked Jul 05 '12 14:07

Thomas


1 Answers

You can use the QueryParser class to parse user-provided queries into Lucene Query object trees. There's also a MultiFieldQueryParser which will generate queries searching over several fields. This matches what you're asking for.

var fields = new[] { "Title", "Description", "Url" };
var analyzer = new StandardAnalyzer(Version.LUCENE_30);
var queryParser = new MultiFieldQueryParser(Version.LUCENE_30, fields, analyzer);
var query = queryParser.Parse("Audi BMW ECU");

The generated query looks like (Title:audi Description:audi Url:audi) (Title:bmw Description:bmw Url:bmw) (Title:ecu Description:ecu Url:ecu).

You could let the user build phrase queries by surrounding the phrase with quotes. This is the standard query format in Lucene.

var fields = new[] { "Title", "Description", "Url" };
var analyzer = new StandardAnalyzer(Version.LUCENE_30);
var queryParser = new MultiFieldQueryParser(Version.LUCENE_30, fields, analyzer);
var query = queryParser.Parse("\"Audi BMW ECU\"");

This generated query looks like Title:"audi bmw ecu" Description:"audi bmw ecu" Url:"audi bmw ecu".

The QueryParser also support wildcard queries using * and ? as you want. There's also support for fuzzy searches; "audi~0.5". There's several other query types available, like proximity searches and term boosts. Everything is available in the Query Parser Syntax documentation.

Adding functionality to help users with misspelt words is a bigger undertaking. You could rewrite your query into fuzzy searches, but that would disable any analyzer enabled (and thus any stemming you have). You could also experiment with different did-you-mean solutions by rewriting your queries into similar queries with more matches. There's a lot to experiment here.

like image 115
sisve Avatar answered Oct 15 '22 19:10

sisve