I am adding Billions of rows to Lucene index, each row is almost 6000 Bytes. Is there any limit on the maximum number of rows that can be added to Lucene Index? How much space would Billion rows of 6000 bytes occupy on Lucene Index. Is there any limit for this size?
A Lucene Index Is an Inverted IndexA term combines a field name with a token. The terms created from the non-text fields in the document are pairs consisting of the field name and the field value. The terms created from text fields are pairs of field name and token.
When using the default Sitefinity CMS search service (Lucene), the search index definition (configurations which content to be indexed) is stored in your website database, and the actual search index files – on the file system. By default, the search index files are in the ~/App_Data/Sitefinity/Search/ folder.
Why is Lucene faster? Lucene is very fast at searching for data because of its inverted index technique. Normally, datasources structure the data as an object or record, which in turn have fields and values.
Lucene or Apache Lucene is an open-source Java library used as a search engine. Elasticsearch is built on top of Lucene. Elasticsearch converts Lucene into a distributed system/search engine for scaling horizontally.
See Lucene documentation for its limitations, it cannot have more than
For such large datasets, it is generally a good idea to only use Lucene for its inverted index, and to store the actual content of documents somewhere else. You can expect the index size to be ~ 30% of the size of the original corpus of documents (provided these are regular documents, computationally-generated documents with a lot of unique terms would generate a much bigger index).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With