Is it possible to extract the list of all the terms in a Lucene index as a list of strings? I couldn't find that functionality in the doc. Thanks!
Step 1 − Create object of IndexWriter. Step 2 − Create a Lucene directory which should point to location where indexes are to be stored. Step 3 − Initialize the IndexWriter object created with the index directory, a standard analyzer having version information and other required/optional parameters.
Simply put, Lucene uses an “inverted indexing” of data – instead of mapping pages to keywords, it maps keywords to pages just like a glossary at the end of any book. This allows for faster search responses, as it searches through an index, instead of searching through text directly.
Apache Lucene is a free and open-source search engine software library, originally written in Java by Doug Cutting. It is supported by the Apache Software Foundation and is released under the Apache Software License. Lucene is widely used as a standard foundation for non-research search applications. Lucene.
In Lucene 4 (and 5):
Terms terms = SlowCompositeReaderWrapper.wrap(directoryReader).terms("field");
Edit:
This seems to be the 'correct' way now (Lucene 6 and up):
LuceneDictionary ld = new LuceneDictionary( indexReader, "field" );
BytesRefIterator iterator = ld.getWordsIterator();
BytesRef byteRef = null;
while ( ( byteRef = iterator.next() ) != null )
{
String term = byteRef.utf8ToString();
}
Lucene 3:
C#: C# Lucene get all the index
Java:
IndexReader indexReader = IndexReader.open(path);
TermEnum termEnum = indexReader.terms();
while (termEnum.next()) {
Term term = termEnum.term();
System.out.println(term.text());
}
termEnum.close();
indexReader.close();
Java (all terms for a specific field): How can I get the list of unique terms from a specific field in Lucene?
Python: Finding a single fields terms with Lucene (PyLucene)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With