Database with "Open Schema" - Good or Bad Idea?

Tags:

The co-founder of Reddit gave a presentation on issues they had while scaling to millions of users. A summary is available here.

What surprised me is point 3:

Instead, they keep a Thing Table and a Data Table. Everything in Reddit is a Thing: users, links, comments, subreddits, awards, etc. Things keep common attribute like up/down votes, a type, and creation date. The Data table has three columns: thing id, key, value. There’s a row for every attribute. There’s a row for title, url, author, spam votes, etc. When they add new features they didn’t have to worry about the database anymore. They didn’t have to add new tables for new things or worry about upgrades.

This seems like a terrible idea to me, but it seems to have worked out for Reddit. Is it a good idea in general, though? Or is it a peculiarity of Reddit that happened to work out for them?

214

asked May 18 '10 02:05

Claudiu

2 Answers

This is a data model known as EAV for entity-attribute-value. It has its uses. A prime example is patient test data which is naturally sparse since there are hundreds of thousands of tests which might be run, but typically only a handful are present for a patient. A table with hundreds of thousands of columns is silly, but a table with EAV makes good sense.

193

answered Sep 21 '22 06:09

wallyk

Most of the really big web sites end up using some sort of incredibly simple on the database side of things. This has the advantage that it's fast and scalable. It has the disadvantage that all the relationships that you'd get the database to enforce automatically (via triggers and such) you need to enforce yourself in your client code instead. Maintaining consistency is a pain in the neck, and there's almost always at least some chance that your data will be inconsistent, at least for short periods of time.

For a social networking site, it's a worthwhile compromise. Data that's mostly right most of the time is adequate (e.g., who really cares if the number of up-votes you receive for an item is really 20 milliseconds out of date when it's sent), and keeping costs reasonable while scaling to support a gazillion users matters a lot.

answered Sep 19 '22 06:09

Jerry Coffin

Related questions
                            
                                What is difference between " * " and "Auto" in Silverlight Grid Layout Definitions
                            
                                axis2 maven example
                            
                                List<int> initialization in C# 3.5
                            
                                Programming VHDL on Linux?
                            
                                Draw a JButton to look like a JLabel (or at least without the button edge?)
                            
                                AES CTR 256 Encryption Mode of operation on OpenSSL
                            
                                Variable declaration question mark
                            
                                jQuery alternative for document.activeElement
                            
                                Best practice to store .jar files in VCS (SVN, Git, ...)
                            
                                Posting an array in mvc form
                            
                                Rhino Mocks - Using Arg.Matches
                            
                                How to track user time on site

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With