How do search engines merge results from an inverted index?
For example, if I searched for the inverted indexes of the words "dog" and "bat", there would be two huge lists of every document which contained one of the two words.
I doubt that a search engine walks through these lists, one document at a time, and tries to find matches with the results of the lists. What is done algorithmically to make this merging process blazing fast?
Actually search engines do merge these document lists. They gain good performance by using other techniques, the most important of which is pruning: for example, for every word the documents are stored in order of decreasing pagerank, and to get results that have a chance of getting into the first 10 (which will be shown to the user) you may traverse just a fairly small portion of the dog and bat lists, say, the first thousand. (and, of course, there's caching, but that's not related to the very query execution algorithm)
Besides, after all, there are not that many documents about dogs and about bats: even if it's millions, it turns into split seconds with a good implementation.
P.S. I worked at our country's leading search engine, however, not in the very engine of our flagship search product, but I talked to its developers and was surprised to know that query execution algorithms are actually fairly dumb: it turns out that one may squash a huge amount of computation into acceptable time bounds. It is all very optimized of course, but there's no magic and no miracles.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With