How does a search engine rank millions of pages within 1 second?

Tags:

search-engine

I understand the basics of search engine ranking, including the ideas of "reverse index", "vector space model", "cosine similarity", "PageRank", etc.

However, when a user submits a popular query term, it is very likely that millions of pages containing this term. As a result, a search engine still needs to sort these millions of pages in real time. For example, I just tried searching "Barack Obama" in Google. It shows "About 937,000,000 results (0.49 seconds)". Ranking over 900M items within 0.5 seconds? That really blows my mind!

How does a search engine sort such a large number of items within 1 second? Can anyone give me some intuitive ideas or point out references?

Thanks!

UPDATE:

Most of the responses (including some older discussions) so far seem to contribute the credit to "reverse index". However, as far as I know, reverse index only helps find the "relevant pages". In other words, by inverse index Google could obtain the 900M pages containing "Barack Obama" (out of over several billions of pages). However, it is still not clear how to "rank" these millions of "relevant pages" based on the threads I read so far.
MapReduce framework is unlikely to be the key component for real-time ranking. MapReduce is designed for batch tasks. When submitting a job to a MapReduce framework, the response time is usually at least a minute, which is apparently too slow to meet our request.

739

asked Oct 03 '13 14:10

user1036719

1 Answers

The question would be really relevant if we were sure that the ranking was complete. It is quite possible that the ordering provided is approximate.

Given the fluidity of the ranking results, no answer that looks reasonable could be considered incorrect. For example, if an entire section of the web were excluded from the top results, you would not notice, provided they were included later.

This gives the developers a degree of latitude entirely unavailable in almost all other domains.

The real question to ask is - how precisely do the results match the actual rank assigned to each page?

184

answered Sep 28 '22 08:09

Pekka

Related questions
                            
                                How to sort files numerically from linux command line
                            
                                How to sort an assocoative array in php Laravel [duplicate]
                            
                                Sorting a vector of pairs [duplicate]
                            
                                How can I sort a hash's keys naturally?
                            
                                A good reference card / cheat sheet with the basic sort algorithms in C? [closed]
                            
                                python: order a list of numbers without built-in sort, min, max function
                            
                                sort function C++ segmentation fault
                            
                                implementing merge sort in C++
                            
                                Find Kth Smallest Pair Distance - Analysis
                            
                                Interesting sorting problem
                            
                                SQL Server Sorting Algorithm
                            
                                Is there a black box method to detect if a sorting algorithm is stable?
                            
                                Python 3 sorting: Custom comparer removed in favor of key - why?
                            
                                Understanding Big O notation - Cracking the Coding Interview
                            
                                Sorting algorithm for a non-comparison based sort problem?
                            
                                Paging of frequently changing data
                            
                                Sort list of dictionaries by multiple keys with different ordering
                            
                                Unexpected behavior when sorting strings with letters and dashes
                            
                                How to improve these nested for loops [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With