Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read a Lucene index?

I'm working on a project for which I want to build a tag cloud by reading a Lucene index and pruning it down. I didn't set up the Lucene engine, it was someone else in the team, now I just want to read its index. Do you how to do that in Java?

like image 824
John Manak Avatar asked Feb 25 '10 15:02

John Manak


People also ask

How do I open a Lucene index file?

If you would like to explore your indexed data, once it has been created, you can use Luke. In case you have not used it before: To run Luke, you need to download a binary release from the main download page. Unzip the file, and then navigate to the luke directory. Then run the relevant script ( luke.

How does Lucene index search work?

Simply put, Lucene uses an “inverted indexing” of data – instead of mapping pages to keywords, it maps keywords to pages just like a glossary at the end of any book. This allows for faster search responses, as it searches through an index, instead of searching through text directly.

What is the Lucene index file?

The index stores statistics about terms in order to make term-based search more efficient. Lucene's index falls into the family of indexes known as an inverted index. This is because it can list, for a term, the documents that contain it. This is the inverse of the natural relationship, in which documents list terms.


2 Answers

Not sure what you mean by "reading" an Index:

  1. If you want to query it you can use IndexSearcher class.

  2. IndexReader allows you to open the index in read mode.

If you want to view the contents of the index, you can use Luke

like image 169
Mikos Avatar answered Oct 05 '22 20:10

Mikos


You do it like this -

IndexReader r = IndexReader.open( "prdb_index");

int num = r.numDocs();
for ( int i = 0; i < num; i++)
{
    if ( ! r.isDeleted( i))
    {
        Document d = r.document( i);
        System.out.println( "d=" +d);
    }
}
r.close();
like image 25
Srikar Appalaraju Avatar answered Oct 05 '22 20:10

Srikar Appalaraju