Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keyword (OR, AND) search in Lucene

Tags:

java

lucene

I am using Lucene in my portal (J2EE based) for indexing and search services.

The problem is about the keywords of Lucene. When you use one of them in the search query, you'll get an error.

For example:

searchTerms = "ik OR jij"

This works fine, because it will search for "ik" or "jij"

searchTerms = "ik AND jij"

This works fine, it searches for "ik" and "jij"

But when you search:

searchTerms = "OR"
searchTerms = "AND"
searchTerms = "ik OR"
searchTerms = "OR ik"

Etc., it will fail with an error:

Component Name: STSE_RESULTS  Class: org.apache.lucene.queryParser.ParseException  Message: Cannot parse 'OR jij': Encountered "OR" at line 1, column 0. 
Was expecting one of: 
... 

It makes sense, because these words are keywords for Lucene are probably reserved and will act as keywords.

In Dutch, the word "OR" is important because it has a meaning for "Ondernemings Raad". It is used in many texts, and it needs to be found. For example "or" does work, but does not return texts matching the term "OR". How can I make it searchable?

How can I escape the keyword "or"? Or How can I tell Lucene to treat "or" as a search term NOT as a keyword.

like image 430
Areca Avatar asked Aug 21 '09 11:08

Areca


People also ask

How do you search in Lucene?

Lucene supports fielded data. When performing a search you can either specify a field, or use the default field. The field names and default field is implementation specific. You can search any field by typing the field name followed by a colon ":" and then the term you are looking for.

Why is Lucene so fast?

Why is Lucene faster? Lucene is very fast at searching for data because of its inverted index technique. Normally, datasources structure the data as an object or record, which in turn have fields and values.

Does elastic search use Lucene?

Elasticsearch is also an open-source search engine built on top of Apache Lucene, as the rest of the ELK Stack, including Logstash and Kibana.

What is Lucene in elastic search?

Lucene or Apache Lucene is an open-source Java library used as a search engine. Elasticsearch is built on top of Lucene. Elasticsearch converts Lucene into a distributed system/search engine for scaling horizontally.


1 Answers

I suppose you have tried putting the "OR" into double quotes?

If that doesn't work I think you might have to go so far as to change the Lucene source and then recompile the whole thing, as the operator "OR" is buried deep inside the code. Actually, compiling probably isn't even enough: you'll have to change the file QueryParser.jj in the source package that serves as input for JavaCC, then run JavaCC, then recompile the whole thing.

The good news, however, is that there's only one line to change:

| <OR: ("OR" | "||") >

becomes

| <OR: ("||") >

That way, you'll have only "||" as logical OR operator. There is a build.xml that also contains the invocation of JavaCC, but you have to download that tool yourself. I can't try it myself right now, I'm afraid.

This is perhaps a good question for the Lucene developer mailing list, but please let us know if you do that and they come up with a simpler solution ;-)

like image 181
Robert Petermeier Avatar answered Oct 05 '22 01:10

Robert Petermeier