Bigtable database design theory

Tags:

I am very well versed in the theory and practice of relational database design.

I know what works and what doesn't, what is performant and what is maintainable (almost - there's always place to tweak when you start having real data).

It seems I can't find a substantial body of knowledge regarding distributed scalable databases such as Google's Bigtable (for writing apps for google app engine). What works, what doesn't, what will scale, why won't?

Sure, there are some blog posts and articles, but are there books or academic research papers on designing databases for bigtable and similar database paradigms?

712

asked Sep 30 '09 07:09

flybywire

2 Answers

... are there books or academic research papers on designing databases for bigtable and similar database paradigms?

Well Bigtable is essentially a database itself, so I take it that your question is more on how to model and to some extent design your schema in these Bigtable like databases. More specifically you would like to know how to do this on Google's App Engine.

With GAE you will be using the Datastore API, which adds a significant layer of abstraction to Bigtable, so to some extent you don't have to worry about low level details as you would if you were using something like HBase. There are a few posts on SO (here's a great answer by a Google Engineer who I think is part of GAE team) that will guide you and offer hints on how to approach this new type of Database system.

Helpful Info:

HBase was inspired by Google's Bigtable (Alternate Link) paper
Hypertable was also inspired by Bigtable paper
Cassandra's Data Model was inspired by Bigtable paper
Hadoop was inspired by Google's GFS and MapReduce papers

170

answered Sep 24 '22 06:09

fuentesjr

There's not much recent literature on non-relational database design that I'm aware of - though you might gain some valuable insights by digging up old papers from before the relational paradigm 'won'.

The basic insight of databases like Bigtable is, of course, that in web-apps and other read-heavy applications, given the availability of cheap disk storage, the best approach is to optimize for reads, and do more work on writes. Normalization does the opposite - minimizing replication of data on disk, thus making writes easier and cheaper, but reads harder. Pretty much all the differences to relational database design arise from this single fact.

The other consequence - one that could use more attention - is that when you optimize for reads, you have to know what type of reads you will be engaging in ahead of time, while normalized structures are more or less read-agnostic.

answered Sep 25 '22 06:09

Nick Johnson

Related questions
                            
                                Does it make sense to create new table or add fields
                            
                                Store the day of the week and time?
                            
                                What's the best way to store a title in a database to allow sorting without the leading "The", "A"
                            
                                SQL mapping between multiple tables
                            
                                How to design a MySql Table for a Tag Cloud?
                            
                                database schema for timesheet
                            
                                Database Normalization
                            
                                What is best practice for representing time intervals in a data warehouse?
                            
                                Django: Best practices for database design
                            
                                Database performance of view vs new table
                            
                                When is it ok to NOT normalize? [closed]
                            
                                Reordering an ordered list
                            
                                Credit system: history based or balance based?
                            
                                Is a good idea to store chat messages in a mongodb collection?
                            
                                What are the first issues to check while optimizing an existing database?
                            
                                How to design a generic database whose layout may change over time?
                            
                                Storing many bits -- Should I use multiple columns or a single bitfield column?
                            
                                SQLite Unique Key with a combination of two columns
                            
                                What's the proper way to store this data in a MySQL schema?
                            
                                Database Design: track a vast number of attributes for each user. So much so, that I will likely run out of columns (row storage space)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Bigtable database design theory

Tags:

database-design

google-app-engine

bigtable

flybywire

People also ask

2 Answers

fuentesjr

Nick Johnson

Recent Activity

Donate For Us