How does Lucene work

2 Answers

Lucene is an inverted full-text index. This means that it takes all the documents, splits them into words, and then builds an index for each word. Since the index is an exact string-match, unordered, it can be extremely fast. Hypothetically, an SQL unordered index on a varchar field could be just as fast, and in fact I think you'll find the big databases can do a simple string-equality query very quickly in that case.

Lucene does not have to optimize for transaction processing. When you add a document, it need not ensure that queries see it instantly. And it need not optimize for updates to existing documents.

However, at the end of the day, if you really want to know, you need to read the source. Both things you reference are open source, after all.

answered Oct 09 '22 23:10

bmargulies

Lucene creates a big index. The index contains word id, number of docs where the word is present, and the position of the word in those documents. So when you give a single word query it just searches the index (O(1) time complexity). Then the result is ranked using different algorithms. For multi-word query just take the intersection of the set of files where the words are present. Thus Lucene is very very fast.

For more info read this article by Google developers- http://infolab.stanford.edu/~backrub/google.html

answered Oct 09 '22 21:10

alienCoder

Related questions
                            
                                Lucene Score results
                            
                                Is there a pure Python Lucene?
                            
                                How to evaluate hosted full text search solutions?
                            
                                Lucene indexing: Store and indexing modes explained
                            
                                Best practices for searchable archive of thousands of documents (pdf and/or xml)
                            
                                Kibana query exact match
                            
                                How to do query auto-completion/suggestions in Lucene?
                            
                                Elasticsearch always returning "mapping type is missing"
                            
                                Solr Collection vs Cores
                            
                                Understanding Segments in Elasticsearch
                            
                                Is using a load balancer with ElasticSearch unnecessary?
                            
                                Retrieving specific fields in a Solr query?
                            
                                How would one use Lucene.NET to help implement search on a site like Stack Overflow?
                            
                                Is there a good indexing / search engine for Node.js? [closed]
                            
                                Why are document stores like Lucene / Solr not included in NoSQL conversations?
                            
                                Why is Solr so much faster than Postgres?
                            
                                How to specify two Fields in Lucene QueryParser?
                            
                                How to get a Token from a Lucene TokenStream?
                            
                                What does percolator mean/do in elasticsearch?
                            
                                using OR and NOT in solr query

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How does Lucene work

Tags:

lucene

Midhat

People also ask

2 Answers

bmargulies

alienCoder

Recent Activity

Donate For Us