Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which is the best choice to indexing a Boolean value in lucene?

Tags:

java

lucene

Indexing a Boolean value(true/false) in lucene(not need to store) I want to get more disk space usage and higher search performance

doc.add(new Field("boolean","true",Field.Store.NO,Field.Index.NOT_ANALYZED_NO_NORMS));
//or
doc.add(new Field("boolean","1",Field.Store.NO,Field.Index.NOT_ANALYZED_NO_NORMS));
//or
doc.add(new NumericField("boolean",Integer.MAX_VALUE,Field.Store.NO,true).setIntValue(1));

Which should I choose? Or any other better way?

thanks a lot

like image 907
Koerr Avatar asked Mar 12 '12 03:03

Koerr


2 Answers

An interesting question!

  • I don't think the third option (NumericField) is a good choice for a boolean field. I can't think of any use case for this.
  • The Lucene search index (leaving to one side stored data, which you aren't using anyway) is stored as an inverted index
  • Leaving your first and second options as (theoretically) identical

If I was faced with this, I think I would choose option one ("true" and "false" terms), if it influences the final decision.

Your choice of NOT_ANALYZED_NO_NORMS looks good, I think.

like image 63
Adrian Conlon Avatar answered Sep 20 '22 09:09

Adrian Conlon


Lucene jumps through an elaborate set of hoops to make NumericField searchable by NumericRangeQuery, so definitely avoid it an all cases where your values don't represent quantities. For example, even if you index an integer, but only as a unique ID, you would still want to use a plain String field. Using "true"/"false" is the most natural way to index a boolean, while using "1"/"0" gives just a slight advantage by avoiding the possibility of case mismatch or typo. I'd say this advantage is not worth much and go for true/false.

like image 40
Marko Topolnik Avatar answered Sep 19 '22 09:09

Marko Topolnik