Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find list of terms indexed by Lucene

Tags:

lucene

Is it possible to extract the list of all the terms in a Lucene index as a list of strings? I couldn't find that functionality in the doc. Thanks!

like image 786
Frank Avatar asked Jun 21 '12 23:06

Frank


People also ask

How do you find the Lucene index?

Step 1 − Create object of IndexWriter. Step 2 − Create a Lucene directory which should point to location where indexes are to be stored. Step 3 − Initialize the IndexWriter object created with the index directory, a standard analyzer having version information and other required/optional parameters.

How does Lucene index search work?

Simply put, Lucene uses an “inverted indexing” of data – instead of mapping pages to keywords, it maps keywords to pages just like a glossary at the end of any book. This allows for faster search responses, as it searches through an index, instead of searching through text directly.

What is Lucene search library?

Apache Lucene is a free and open-source search engine software library, originally written in Java by Doug Cutting. It is supported by the Apache Software Foundation and is released under the Apache Software License. Lucene is widely used as a standard foundation for non-research search applications. Lucene.


2 Answers

In Lucene 4 (and 5):

 Terms terms = SlowCompositeReaderWrapper.wrap(directoryReader).terms("field"); 

Edit:

This seems to be the 'correct' way now (Lucene 6 and up):

LuceneDictionary ld = new LuceneDictionary( indexReader, "field" );
BytesRefIterator iterator = ld.getWordsIterator();
BytesRef byteRef = null;
while ( ( byteRef = iterator.next() ) != null )
{
    String term = byteRef.utf8ToString();
}
like image 162
Rob Audenaerde Avatar answered Oct 22 '22 12:10

Rob Audenaerde


Lucene 3:

  • C#: C# Lucene get all the index

  • Java:

    IndexReader indexReader = IndexReader.open(path); 
    TermEnum termEnum = indexReader.terms(); 
    while (termEnum.next()) { 
        Term term = termEnum.term(); 
        System.out.println(term.text()); 
    }
    termEnum.close(); 
    indexReader.close(); 
    
  • Java (all terms for a specific field): How can I get the list of unique terms from a specific field in Lucene?

  • Python: Finding a single fields terms with Lucene (PyLucene)

like image 29
miku Avatar answered Oct 22 '22 12:10

miku