Bigtable Practical Example

Tags:

bigtable

Can someone provide a real-world example of how data would be structured within a Bigtable? Please talk from a search engine, social networking or any other familiar point of view which illustrates clearly and pragmatically how the row -> column family -> column combo is superior to traditional normalized relational approaches.

212

asked Jul 21 '11 10:07

S. Valmont

2 Answers

Reading the original Google white paper was helpful:

http://static.googleusercontent.com/external_content/untrusted_dlcp/labs.google.com/en//papers/bigtable-osdi06.pdf

As was this comprehensive list of information sources on Google data architecture:

http://highscalability.com/google-architecture

Update: 11/4/14

A new version of the Google white paper PDF can be found here:

http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf

answered Sep 25 '22 18:09

S. Valmont

I believe the difference is more about the way the data are queried rather the way they are stored.

The main difference between relational databases and NoSQL is that there is, um, no SQL in the latter.

This means you (not the query optimizer) write the query plans yourself.

This may increase the query performance if you know how to do that.

Consider a typical search engine query: find top 10 pages with all (or some) words included, say, "wet t-shirt contest", ordered by relevance (we're leaving word proximity aside for simplicity sake).

To do this, you need all words split and kept in a searchable and iterable list ordered by (word, relevance, source). Then you partition this list into (3 * ranks) sets (each starting at the top of the words in your search query at a given rank), where ranks is the possible number or ranks, say, 1 to 10; and join the sets on source, .

In a relational database it would look like this:

SELECT  w1.source
FROM    ranks r1
JOIN    words w1
ON      w1.word = 'wet'
        AND w1.rank = r1.value
CROSS JOIN
        ranks r2
JOIN    words w2
ON      w2.word = 'shirt'
        AND w2.rank = r2.value
        AND w2.source = w1.source
CROSS JOIN
        ranks r3
JOIN    words w3
ON      w3.word = 'contest'
        AND w3.rank = r2.value
        AND w3.source = w1.source
ORDER BY
        relevance_formula (w1.rank, w2.rank, w3.rank)
LIMIT 10

This is best executed using a MERGE JOIN over the three sets partitioned by rank.

However, no optimizer I'm aware of will build this plan (leaving aside the fact that relevance_formula may not distribute over the individual ranks).

To work around this, you should implement your own query plan: start at the top of each word/rank pair and just descend all three sets simultaneously, skipping the missing values and using search rather then next if you feel that there will be too much to skip in one of the sets.

Thus said, relational approach gives you a more convenient way to query data at cost of possible performance penalty.

If you are developing a campus web server, then writing those SELECT * is OK even they are executed one microsecond longer than they possibly could be. But if you're developing a Google, it worth spending some time on optimizing the queries (which pure relational systems only allowing access to their data using SQL just would not let to do).

The such called NoSQL and relational databases sometimes diffuse into each other. For instance, Berkeley DB is a well-known NoSQL storage engine which was used by MySQL as its storage backend to allow SQL queries. And vice versa, HandlerSocket allows pure key-value queries to a relational InnoDB store with a MySQL database built over it.

answered Sep 23 '22 18:09

Quassnoi

Related questions
                            
                                What is the benefit of a Key-Value Store over Bigtable?
                            
                                Reduce access time to Bigtable with Golang
                            
                                How to connect to a running bigtable emulator from java
                            
                                Trying to simulate cell level TTL in bigtable but whole column family data is getting removed by garbage collection
                            
                                SQL query : inner joins optimization between big tables
                            
                                Do a mass db.delete on App Engine, without eating CPU
                            
                                What aspect of relational databases makes it difficult for them to scale sufficiently on services like Google App Engine?
                            
                                How to connect to Bigtable Emulator from a GoLang application? How to use it?
                            
                                Database design - google app engine
                            
                                GQL query for <missing>
                            
                                Bigtable database design theory
                            
                                Database that consumes less disk space
                            
                                parent->child relationships in appengine python (bigtable)
                            
                                Querying for a value existing in a model's list property in AppEngine
                            
                                30 million records a day, SQL Server can't keep up, other kind of database system needed?
                            
                                Google Cloud Bigtable backup and recovery
                            
                                what exactly is a map dimension in a multi-dimensional map?
                            
                                What it the difference between Hbase and BigTable?
                            
                                Tree structures in a nosql database

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Bigtable Practical Example

Tags:

bigtable

S. Valmont

People also ask

2 Answers

S. Valmont

Quassnoi

Recent Activity

Donate For Us