How can I get the list of unique terms from a specific field in Lucene?

Question

I have an index from a large corpus with several fields. Only one these fields contain text. I need to extract the unique words from the whole index based on this field. Does anyone know how I can do that with Lucene in java?

Alex Moore-Niemi · Accepted Answer

As of Lucene 7+ the above and some related links are obsolete.

Here's what's current:

// IndexReader has leaves, you'll iterate through those
int leavesCount = reader.leaves().size();
final String fieldName = "content";

for(int l = 0; l < leavesCount; l++) {
  System.out.println("l: " + l);
  // specify the field here ----------------------------->
  TermsEnum terms = reader.leaves().get(l).reader().terms(fieldName).iterator();
  // this stops at 20 just to sample the head
  for(int i = 0; i < 20; i++) {
    // and to get it out, here -->
    final Term content = new Term(fieldName, BytesRef.deepCopyOf(terms.next()));
    System.out.println("i: " + i + ", term: " + content);
  }
}

How can I get the list of unique terms from a specific field in Lucene?

Tags:

java

lucene

Hossein

1 Answers

Alex Moore-Niemi

Recent Activity

Donate For Us

How can I get the list of unique terms from a specific field in Lucene?

Tags:

java

lucene

Hossein

1 Answers

Alex Moore-Niemi

Related questions

Recent Activity

Donate For Us