Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How datas are stored in lucene

Tags:

lucene

I know that lucene creates an index and stores all the data .Can any one tell me how the data is stored in flat file? or what kind of algorithms they use to store the data in backend so that they can retrieve it quickly?

like image 288
Ramesh Avatar asked Feb 01 '12 07:02

Ramesh


People also ask

Does Lucene use a database?

Lucene is not a database — as I mentioned earlier, it's just a Java library.

What is Lucene and how does it work?

Lucene is an inverted full-text index. This means that it takes all the documents, splits them into words, and then builds an index for each word. Since the index is an exact string-match, unordered, it can be extremely fast.

What is Lucene database?

Apache Lucene™ is a high-performance, full-featured search engine library written entirely in Java. It is a technology suitable for nearly any application that requires structured search, full-text search, faceting, nearest-neighbor search across high-dimensionality vectors, spell correction or query suggestions.

Is Lucene a NoSQL database?

Apache Solr is a subproject of Apache Lucene, which is the indexing technology behind most recently created search and index technology. Solr is a search engine at heart, but it is much more than that. It is a NoSQL database with transactional support.


2 Answers

Don't know if this is what you asked for. But the more general answer is that they use/implement a Inverted Index. The specifics of how Lucene stores it you can find in file formats (as milan said).

But the general idea is that they store a Inverted Index data structure and other auxiliar data structures to help answer queries quickly. For example, it stores a vector of norms for each document and each term's IDF (inverse document frequency). Lucene also stores the actual document fields, but that is outside the Inverted Index.

like image 168
Felipe Hummel Avatar answered Oct 18 '22 22:10

Felipe Hummel


You can find all that explained in the file formats section.

like image 42
milan Avatar answered Oct 18 '22 21:10

milan