I'm developing a Desktop Search Engine in Visual Basic 9 (VS2008) using Lucene.NET (v2.0).
I use the following code to initialize the IndexWriter
Private writer As IndexWriter
writer = New IndexWriter(indexDirectory, New StandardAnalyzer(), False)
writer.SetUseCompoundFile(True)
If I select the same document folder (containing files to be indexed) twice, two different entries for each file in that document folder are created in the index.
I want the IndexWriter to discard any files that are already present in the Index.
What should I do to ensure this?
Lucene - Update Document Operation 1 Update a Document to an Index. Step 1 − Create a method to update a Lucene document from an updated text file. 2 Create an IndexWriter. ... 3 Update document and start reindexing process. ... 4 Example Application. ... 5 Data & Index Directory Creation. ... 6 Running the Program. ...
Step 1 − IndexWriter class acts as a core component which creates/updates indexes during the indexing process. Step 2 − Create object of IndexWriter. Step 3 − Create a Lucene directory which should point to location where indexes are to be stored.
This class is used to provide various constants to be used across the sample application. This class is used as a .txt file filter. This class is used to index the raw data so that we can make it searchable using the Lucene library. This class is used to test the indexing capability of the Lucene library.
Update document is another important operation as part of indexing process. This operation is used when already indexed contents are updated and indexes become invalid. This operation is also known as re-indexing.
If you want to delete all content in the index and refill it, you could use this statement
writer = New IndexWriter(indexDirectory, New StandardAnalyzer(), True)
The last parameter of the IndexWriter constructor determines whether a new index is created, or whether an existing index is opened for the addition of new documents.
To update a lucene index you need to delete the old entry and write in the new entry. So you need to use an IndexReader to find the current item, use writer to delete it and then add your new item. The same will be true for multiple entries which I think is what you are trying to do.Just find all the entries, delete them all and then write in the new entries.
As Steve mentioned, you need to use an instance of IndexReader and call its DeleteDocuments method. DeleteDocuments accepts either an instance of a Term object or Lucene's internal id of the document (it is generally not recommended to use the internal id as it can and will change as Lucene merges segments).
The best way is to use a unique identifier that you've stored in the index specific to your application. For example, in an index of patients in a doctor's office, if you had a field called "patient_id" you could create a term and pass that as an argument to DeleteDocuments. See the following example (sorry, C#):
int patientID = 12;
IndexReader indexReader = IndexReader.Open( indexDirectory );
indexReader.DeleteDocuments( new Term( "patient_id", patientID ) );
Then you could add the patient record again with an instance of IndexWriter. I learned a lot from this article http://www.codeproject.com/KB/library/IntroducingLucene.aspx.
Hope this helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With