Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting the Doc ID in Lucene

In lucene, I can do the following

doc.GetField("mycustomfield").StringValue();

This retrieves the value of a column in an index's document.

My question, for the same 'doc', is there a way to get the Doc. Id ? Luke displays it hence there must be a way to figure this out. I need it to delete documents on updates.

I scoured the docs but have not found the term to use in GetField or if there already is another method.

like image 623
Matt Avatar asked Oct 14 '22 14:10

Matt


2 Answers

Turns out you have to do this:

var hits = searcher.Search(query);
var result = hits.Id(0);

As opposed to

var results = hits.Doc(i);
var docid = results.<...> //there's nothing I could find there to do this
like image 147
Matt Avatar answered Oct 20 '22 16:10

Matt


I suspect the reason you're having trouble finding any documentation on determining the id of a particular Lucene Document is because they are not truly "id"s. In other words, they are not necessarily meant to be looked up and stored for later use. In fact, if you do, you will not get the results you were hoping for, as the IDs will change when the index is optimized.

Instead, think of the IDs as the current "offset" of a particular document from the start of the index, which will change when deleted documents are physically removed from the index files.

Now with that said, the proper way to look up the "id" of a document is:


QueryParser parser = new QueryParser(...);
IndexSearcher searcher = new IndexSearcher(...);
Hits hits = searcher.Search(parser.Parse(...);

for (int i = 0; i < hits.Length(); i++)
{
   int id = hits.Id(i);

   // do stuff
}
like image 42
jeremyalan Avatar answered Oct 20 '22 17:10

jeremyalan