I know that lucene creates an index and stores all the data .Can any one tell me how the data is stored in flat file? or what kind of algorithms they use to store the data in backend so that they can retrieve it quickly?
Lucene is not a database — as I mentioned earlier, it's just a Java library.
Lucene is an inverted full-text index. This means that it takes all the documents, splits them into words, and then builds an index for each word. Since the index is an exact string-match, unordered, it can be extremely fast.
Apache Lucene™ is a high-performance, full-featured search engine library written entirely in Java. It is a technology suitable for nearly any application that requires structured search, full-text search, faceting, nearest-neighbor search across high-dimensionality vectors, spell correction or query suggestions.
Apache Solr is a subproject of Apache Lucene, which is the indexing technology behind most recently created search and index technology. Solr is a search engine at heart, but it is much more than that. It is a NoSQL database with transactional support.
Don't know if this is what you asked for. But the more general answer is that they use/implement a Inverted Index. The specifics of how Lucene stores it you can find in file formats (as milan said).
But the general idea is that they store a Inverted Index data structure and other auxiliar data structures to help answer queries quickly. For example, it stores a vector of norms for each document and each term's IDF (inverse document frequency). Lucene also stores the actual document fields, but that is outside the Inverted Index.
You can find all that explained in the file formats section.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With