Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get all terms for a Lucene field in Lucene 4

Tags:

java

lucene

api

I'm trying to update my code from Lucene 3.4 to 4.1. I figured out the changes except one. I have code which needs to iterate over all term values for one field. In Lucene 3.1 there was an IndexReader#terms() method providing a TermEnum, which I could iterate over. This seems to have changed for Lucene 4.1 and even after several hours of search in the documentation I am not able to figure out how. Can someone please point me in the right direction?

Thanks.

like image 423
ali Avatar asked Mar 08 '13 09:03

ali


1 Answers

Please follow Lucene 4 Migration guide::

How you obtain the enums has changed. The primary entry point is the Fields class. If you know your reader is a single segment reader, do this:

Fields fields = reader.Fields();
if (fields != null) {
  ...
}

If the reader might be multi-segment, you must do this:

Fields fields = MultiFields.getFields(reader);
if (fields != null) {
  ...
}

The fields may be null (eg if the reader has no fields).

Note that the MultiFields approach entails a performance hit on MultiReaders, as it must merge terms/docs/positions on the fly. It's generally better to instead get the sequential readers (use oal.util.ReaderUtil) and then step through those readers yourself, if you can (this is how Lucene drives searches).

If you pass a SegmentReader to MultiFields.fields it will simply return reader.fields(), so there is no performance hit in that case.

Once you have a non-null Fields you can do this:

Terms terms = fields.terms("field");
if (terms != null) {
  ...
}

The terms may be null (eg if the field does not exist).

Once you have a non-null terms you can get an enum like this:

TermsEnum termsEnum = terms.iterator();

The returned TermsEnum will not be null.

You can then .next() through the TermsEnum

like image 142
phanin Avatar answered Oct 19 '22 13:10

phanin