Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lucene 3.0.3 does not delete document

Tags:

java

lucene

We use Lucene to index some internal documents. Sometimes we need to remove documents. These documents have an unique id and are represented by a class DocItem as follows (ALL THE CODE IS A SIMPLIFIED VERSION WITH ONLY SIGNIFICANT (I hope) PARTS):

public final class DocItem {

  public static final String fID = "id";
  public static final String fTITLE = "title";

  private Document doc = new Document();
  private Field id = new Field(fID, "", Field.Store.YES, Field.Index.ANALYZED);
  private Field title = new Field(fTITLE, "", Field.Store.YES, Field.Index.ANALYZED);

  public DocItem() {
    doc.add(id);
    doc.add(title);
  }

  ... getters & setters

  public getDoc() {
    return doc;
  }
}

So, to index a document, a new DocItem is created and passed to an indexer class as follows:

public static void index(DocItem docitem) {
  File file = new File("indexdir");
  Directory dir= new SimpleFSDirectory(file);
  IndexWriter idxWriter = new IndexWriter(dir, new StandardAnalyzer(Version.LUCENE_30), IndexWriter.MaxFieldLength.UNLIMITED);
  idxWriter.addDocument(docitem.getDoc());
  idxWriter.close();
}

We created an auxiliary method to iterate over the index directory:

public static void listAll() {
  File file = new File("indexdir");
  Directory dir = new SimpleFSDirectory(file);
  IndexReader reader = IndexReader.open(dir);

  for (int i = 0; i < reader.maxDoc(); i++) {
    Document doc = reader.document(i);
    System.out.println(doc.get(DocItem.fID));
  }
}

Running the listAll, we can see that our docs are being indexed properly. At least, we can see the id and other attributes.

We retrieve the document using IndexSearcher as follows:

public static DocItem search(String id) {
  File file = new File("indexdir");
  Directory dir = new SimpleFSDirectory(file);
  IndexSearcher searcher = new IndexSearcher(index, true);
  Query q = new QueryParser(Version.LUCENE_30, DocItem.fID, new StandardAnalyzer(Version.LUCENE_30)).parse(id);
  TopDocs td = searcher.search(q, 1);
  ScoreDoc[] hits = td.scoreDocs;
  searcher.close();
  return hits[0];
}

So after retrieving it, we are trying to delete it with:

public static void Delete(DocItem docitem) {
  File file = new File("indexdir");
  Directory dir= new SimpleFSDirectory(file);
  IndexWriter idxWriter = new IndexWriter(dir, new StandardAnalyzer(Version.LUCENE_30), IndexWriter.MaxFieldLength.UNLIMITED);
  idxWriter.deleteDocuments(new Term(DocItem.fID, docitem.getId()));
  idxWriter.commit();
  idxWriter.close();
}

The problem is that it doesn't work. The document is never deleted. If I run the listAll() after the deletion, the document is still there. We tried to use IndexReader, with no lucky.

By this post and this post, We think that we are using it accordinlgy.

What we are doing wrong? Any advice? We are using lucene 3.0.3 and java 1.6.0_24.

TIA,

Bob

like image 485
Bob Rivers Avatar asked Apr 01 '11 14:04

Bob Rivers


2 Answers

I would suggest, use IndexReader DeleteDocumets, it returns the number of documents deleted. this will help you narrow whether the deletions occur on first count.

the advantage of this over the indexwriter method, is that it returns the total document deleted, if none if shall return 0.

Also see the How do I delete documents from the index? and this post

Edit: Also i noticed you open the indexreader in readonly mode, can you change the listFiles() index reader open with false as second param, this will allow read write, perhaps the source of error

like image 102
Narayan Avatar answered Sep 23 '22 04:09

Narayan


I call IndexWriterConfig#setMaxBufferedDeleteTerms(1) during IndexWriter instantiation/configuration and all delete operations go to disc immediately. Maybe it's not correct design-wise, but solves the problem explained here.

like image 26
yegor256 Avatar answered Sep 23 '22 04:09

yegor256