Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I get DocId when adding a document in Lucene index?

Tags:

I am indexing a row of data from database in Lucene.Net. A row is equivalent of Document.

I want to update my database with the DocId, so that I can use the DocId in the results to be able to retrieve rows quickly.

I currently first retrive the PK from the result docs which I think should be slower than retriving directly from the database using DocId.

How can I find the DocId when adding a document to Lucene?

like image 443
Rohit Avatar asked Mar 11 '10 14:03

Rohit


People also ask

How do I create a document in Lucene?

Add a document to an indexStep 1 − Create a method to get a Lucene document from a text file. Step 2 − Create various fields which are key value pairs containing keys as names and values as contents to be indexed. Step 3 − Set field to be analyzed or not.

How does Lucene store index?

Lucene indexes terms, which means that Lucene search searches over terms. A term combines a field name with a token. The terms created from the non-text fields in the document are pairs consisting of the field name and the field value. The terms created from text fields are pairs of field name and token.

What is DocID?

BigHand DocID is a flexible tool that allows organizations to create consistency across all Microsoft Word and Excel documentation in order to prepare for audits and achieve record keeping compliance more easily.

What is a document in Lucene?

A Document is a set of fields. Each field has a name and a textual value. A field may be stored with the document, in which case it is returned with search hits on the document. Thus each document should typically contain one or more stored fields which uniquely identify it.


1 Answers

Relying on Lucene's DocId is a bad policy, as even Lucene tries to avoid this. I suggest you create your own DocId. In a database I would use an auto-increment field. If your application does not use a relational database, you can create this type of field programmatically. Other than that, I suggest you read Search Engine versus DBMS - I believe that only fields that may be searched should be stored in Lucene; The rest of the row belongs in a database, so the sequence of events is:

  1. Using Lucene, search for some text and get a DocId.
  2. Use the DocId to retrieve the full row from the database.
like image 198
Yuval F Avatar answered Sep 28 '22 18:09

Yuval F