I have an existing Lucene store with many millions of documents, each one representing metadata for an entity. I have a few Id fields (Id1, Id2 .. Id5) and each document can have zero or many values for this field. The index is only ever queried by one of these Ids at a time. I've indexed these fields independently and it's is all working great. I initially chose to use Lucene as it was by far the fastest way to query such a vast number of small documents and I am happy with my decision.
However now I must store another type of document which also represent a different kind of metadata for entities and have values for (Id1, Id2 .. Id5), and which also will be queried by one of those Ids separately. The existing metadata and this new set of data will be stored and queried for independently from each other.
How do I query Lucene by an Id but for only one type of document. I can think of a few options, but I'd like to know what those in the know recommend from experience in order to keep Lucene manageable and fast.
I am able to break backwards compatibility with my existing setup. It would be great if the solution can be reused if I come to add another document type.
I would definitely reject third option because of low selectivity of type index. There will be only 2 distinct values in type field each one with millions of documents. Lucene will need to merge this huge posting list with short posting list from idN index, which still can be very fast, but indeed wasteful.
First two ways are effectively the same on query phase, because you have different terms and posting lists for independent type of documents. Difference will be on the indexing phase. Managing several independent indexes require a bit more coordination and makes code a little bit more difficult. Yet it may be a good idea if you have plans on using indexes in different contexts. For example:
Otherwise, I would go with a first option as more simple and manageable.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With