I'm working on a project for which I want to build a tag cloud by reading a Lucene index and pruning it down. I didn't set up the Lucene engine, it was someone else in the team, now I just want to read its index. Do you how to do that in Java?
If you would like to explore your indexed data, once it has been created, you can use Luke. In case you have not used it before: To run Luke, you need to download a binary release from the main download page. Unzip the file, and then navigate to the luke directory. Then run the relevant script ( luke.
Simply put, Lucene uses an “inverted indexing” of data – instead of mapping pages to keywords, it maps keywords to pages just like a glossary at the end of any book. This allows for faster search responses, as it searches through an index, instead of searching through text directly.
The index stores statistics about terms in order to make term-based search more efficient. Lucene's index falls into the family of indexes known as an inverted index. This is because it can list, for a term, the documents that contain it. This is the inverse of the natural relationship, in which documents list terms.
Not sure what you mean by "reading" an Index:
If you want to query it you can use IndexSearcher class.
IndexReader allows you to open the index in read mode.
If you want to view the contents of the index, you can use Luke
You do it like this -
IndexReader r = IndexReader.open( "prdb_index");
int num = r.numDocs();
for ( int i = 0; i < num; i++)
{
if ( ! r.isDeleted( i))
{
Document d = r.document( i);
System.out.println( "d=" +d);
}
}
r.close();
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With