Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to update a Lucene.NET index?

I'm developing a Desktop Search Engine in Visual Basic 9 (VS2008) using Lucene.NET (v2.0).

I use the following code to initialize the IndexWriter

Private writer As IndexWriter

writer = New IndexWriter(indexDirectory, New StandardAnalyzer(), False)

writer.SetUseCompoundFile(True)

If I select the same document folder (containing files to be indexed) twice, two different entries for each file in that document folder are created in the index.

I want the IndexWriter to discard any files that are already present in the Index.

What should I do to ensure this?

like image 744
user57175 Avatar asked Jan 24 '09 16:01

user57175


People also ask

How do I update a document in Lucene?

Lucene - Update Document Operation 1 Update a Document to an Index. Step 1 − Create a method to update a Lucene document from an updated text file. 2 Create an IndexWriter. ... 3 Update document and start reindexing process. ... 4 Example Application. ... 5 Data & Index Directory Creation. ... 6 Running the Program. ...

How to create Index in Lucene using indexwriter?

Step 1 − IndexWriter class acts as a core component which creates/updates indexes during the indexing process. Step 2 − Create object of IndexWriter. Step 3 − Create a Lucene directory which should point to location where indexes are to be stored.

What is the use of this class in Lucene?

This class is used to provide various constants to be used across the sample application. This class is used as a .txt file filter. This class is used to index the raw data so that we can make it searchable using the Lucene library. This class is used to test the indexing capability of the Lucene library.

What is update document in indexing?

Update document is another important operation as part of indexing process. This operation is used when already indexed contents are updated and indexes become invalid. This operation is also known as re-indexing.


3 Answers

If you want to delete all content in the index and refill it, you could use this statement

writer = New IndexWriter(indexDirectory, New StandardAnalyzer(), True)

The last parameter of the IndexWriter constructor determines whether a new index is created, or whether an existing index is opened for the addition of new documents.

like image 25
splattne Avatar answered Sep 18 '22 20:09

splattne


To update a lucene index you need to delete the old entry and write in the new entry. So you need to use an IndexReader to find the current item, use writer to delete it and then add your new item. The same will be true for multiple entries which I think is what you are trying to do.Just find all the entries, delete them all and then write in the new entries.

like image 109
Steve Severance Avatar answered Sep 21 '22 20:09

Steve Severance


As Steve mentioned, you need to use an instance of IndexReader and call its DeleteDocuments method. DeleteDocuments accepts either an instance of a Term object or Lucene's internal id of the document (it is generally not recommended to use the internal id as it can and will change as Lucene merges segments).

The best way is to use a unique identifier that you've stored in the index specific to your application. For example, in an index of patients in a doctor's office, if you had a field called "patient_id" you could create a term and pass that as an argument to DeleteDocuments. See the following example (sorry, C#):

int patientID = 12;
IndexReader indexReader = IndexReader.Open( indexDirectory );
indexReader.DeleteDocuments( new Term( "patient_id", patientID ) );

Then you could add the patient record again with an instance of IndexWriter. I learned a lot from this article http://www.codeproject.com/KB/library/IntroducingLucene.aspx.

Hope this helps.

like image 25
Ryan Ische Avatar answered Sep 19 '22 20:09

Ryan Ische