How can Google be so fast?

Tags:

algorithm

What are the technologies and programming decisions that make Google able to serve a query so fast?

Every time I search something (one of the several times per day) it always amazes me how they serve the results in near or less than 1 second time. What sort of configuration and algorithms could they have in place that accomplishes this?

Side note: It is kind of overwhelming thinking that even if I was to put a desktop application and use it on my machine probably would not be half as fast as Google. Keep on learning I say.

Here are some of the great answers and pointers provided:

Google Platform
Map Reduce
Algorithms carefully crafted
Hardware - cluster farms and massive number of cheap computers
Caching and Load Balancing
Google File System

240

asked Sep 25 '08 09:09

Jorge Ferreira

2 Answers

Latency is killed by disk accesses. Hence it's reasonable to believe that all data used to answer queries is kept in memory. This implies thousands of servers, each replicating one of many shards. Therefore the critical path for search is unlikely to hit any of their flagship distributed systems technologies GFS, MapReduce or BigTable. These will be used to process crawler results, crudely.

The handy thing about search is that there's no need to have either strongly consistent results or completely up-to-date data, so Google are not prevented from responding to a query because a more up-to-date search result has become available.

So a possible architecture is quite simple: front end servers process the query, normalising it (possibly by stripping out stop words etc.) then distributing it to whatever subset of replicas owns that part of the query space (an alternative architecture is to split the data up by web pages, so that one of every replica set needs to be contacted for every query). Many, many replicas are probably queried, and the quickest responses win. Each replica has an index mapping queries (or individual query terms) to documents which they can use to look up results in memory very quickly. If different results come back from different sources, the front-end server can rank them as it spits out the html.

Note that this is probably a long way different from what Google actually do - they will have engineered the life out of this system so there may be more caches in strange areas, weird indexes and some kind of funky load-balancing scheme amongst other possible differences.

155

answered Oct 22 '22 22:10

HenryR

It's a bit too much to put it in one answer. http://en.wikipedia.org/wiki/Google_platform

answered Oct 23 '22 00:10

Vasil

Related questions
                            
                                Why is tuple faster than list in Python?
                            
                                Huge performance difference when using GROUP BY vs DISTINCT
                            
                                Trying to understand gcc option -fomit-frame-pointer
                            
                                React Navigation vs. React Native Navigation [closed]
                            
                                CSS3 Transitions: Is "transition: all" slower than "transition: x"?
                            
                                SQL 'like' vs '=' performance
                            
                                Why is vectorization, faster in general, than loops?
                            
                                Java 8 times faster with arrays than std::vector in C++. What did I do wrong?
                            
                                Are 64 bit programs bigger and faster than 32 bit versions?
                            
                                PostgreSQL: improving pg_dump, pg_restore performance
                            
                                Ways to improve git status performance
                            
                                The most efficient way to remove first N elements in a list?
                            
                                Does performance differ between Python or C++ coding of OpenCV?
                            
                                Java check if boolean is null
                            
                                Slow debugging issue in Visual Studio
                            
                                Why is looping over range() in Python faster than using a while loop?
                            
                                Why do you program in assembly? [closed]
                            
                                Which is more efficient: Return a value vs. Pass by reference?
                            
                                When should I use ConcurrentSkipListMap?
                            
                                str performance in python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With