Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does Lucene QueryParser needs an Analyzer

I'm new to Lucene and trying to parse a raw string into a Query using the QueryParser.

I was wondering, why is the QueryParser.Parse() method needs an Analyzer parameter at all?

If analyzing is something that has to do with querying, then an Analyzer should be specified when dealing with regular Query objects as well (TermQuery, BooleanQuery etc), and if not, why is QueryParser requires it?

like image 378
haim770 Avatar asked Mar 05 '13 14:03

haim770


People also ask

What is Lucene QueryParser?

QueryParser Class is the basic Class defined in Lucene Core particularly specialized for direct use for parsing queries and maintaining the queries. Different methods are available in the QueryParser Class so that we can easily go with the searching tasks using a wide range of searching options provided by the Lucene.

How do you find special characters in Lucene?

You can't search for special characters in Lucene Search. These are + - = && || > < ! ( ) { } [ ] ^ " ~ * ? : \ / @. You can search for special characters, with the exception of the @ character, in a field-level search as long as you escape them using \ before the special character.

Can Boolean operators like and/or and so on be used in Lucene query syntax?

You can embed Boolean operators in a query string to improve the precision of a match. The full syntax supports text operators in addition to character operators. Always specify text boolean operators (AND, OR, NOT) in all caps.


1 Answers

When indexing, Lucene divides the text into atomic units (tokens). During this phase many things can happen (e.g. lowercasing, stemming, removal of stopwords, etc.). The end result is a term.

Then, when you query, Lucene applies exactly the same algorithm to the query so it can match term with a term.

Q: Why doesn't TermQuery require analyzer?
A: QueryParser object parses query string and produces TermQuery (can also produce other types of queries, e.g. PhraseQuery). TermQuery already contains terms in the same shape as they are in the index. If you (as a programmer) are absolutely sure what you doing, you can create a TermQuery yourself -- but this assumes you know the exact sequence of query parsing and you know how terms look like in the index.

Q: Why doesn't BooleanQuery require analyzer?
A: BooleanQuery just joins other queries using operators (AND/OR/MUST/SHOULD, etc.). It's not really useful itself without any other queries.

This is a very simplified answer. I highly recommend reading Introduction to Information Retrieval book; it contains the theory based on which Lucene (and other similar frameworks) is written. This book is available online for free.

like image 193
mindas Avatar answered Sep 17 '22 17:09

mindas