My use case involves index a Lucene document, then on multiple future occasions add terms that point to this existing doc, that's without deleting and re-adding the entire document for each new term (because of performance, and not keeping the original terms).
I do know that a document can not be truly updated. My question is why?
Or more precisely, why are all forms of updates (terms, stored fields) not supported?
Why it's not possible to add another term to point to an existing document - technically: isn't all that's needed is to have the existing doc Id placed in the posting list of the term. Why is that hard? Is there some immutable statistics that are in the way?
Are there any workarounds for supporting my usecase of adding a term (indexed field) to an existing doc?
Step 1 − IndexWriter class acts as a core component which creates/updates indexes during the indexing process. Step 2 − Create object of IndexWriter. Step 3 − Create a Lucene directory which should point to location where indexes are to be stored.
Why is Lucene faster? Lucene is very fast at searching for data because of its inverted index technique. Normally, datasources structure the data as an object or record, which in turn have fields and values.
Lucene's “doc values” is basically a hack that takes advantage of Cassandra-style “columnar” data storage. We store all the document values in a simple format on-disk. Basically, in flat files.
Simply put, Lucene uses an “inverted indexing” of data – instead of mapping pages to keywords, it maps keywords to pages just like a glossary at the end of any book. This allows for faster search responses, as it searches through an index, instead of searching through text directly.
I do know that a document can not be truly updated. My question is why?
Gili, editing a document will cause changes in the related terms postings and this is problematic due to to the terms posting-list structure. The posting-list is sorted and stored sequential in memory. Thus to add a document to a term's posting-list you have to give it a higher doc id
this is done by deleting and re-index the entire document.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With