Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Database with "Open Schema" - Good or Bad Idea?

Tags:

The co-founder of Reddit gave a presentation on issues they had while scaling to millions of users. A summary is available here.

What surprised me is point 3:

Instead, they keep a Thing Table and a Data Table. Everything in Reddit is a Thing: users, links, comments, subreddits, awards, etc. Things keep common attribute like up/down votes, a type, and creation date. The Data table has three columns: thing id, key, value. There’s a row for every attribute. There’s a row for title, url, author, spam votes, etc. When they add new features they didn’t have to worry about the database anymore. They didn’t have to add new tables for new things or worry about upgrades.

This seems like a terrible idea to me, but it seems to have worked out for Reddit. Is it a good idea in general, though? Or is it a peculiarity of Reddit that happened to work out for them?

like image 214
Claudiu Avatar asked May 18 '10 02:05

Claudiu


People also ask

What makes a good database schema?

A good database design is, therefore, one that: Divides your information into subject-based tables to reduce redundant data. Provides Access with the information it requires to join the information in the tables together as needed. Helps support and ensure the accuracy and integrity of your information.

What is open schema?

The Open Schema Document (OSD) The OSD is a specification for how to write any language-agnostic SDL that can include definitions for data types, data structures AND data constraints. The OSD can be used to define any domain-specific SDL such as: A serialized data validation schema: json-schema.

Is database schema necessary?

Database schemas are important because they help developers visualize how a database should be structured. A project may only use a few tables and fields. Still, having a schema gives developers a clear point of reference about what tables and fields a project contains.

What are the 3 types of database schema?

Schema is of three types: Logical Schema, Physical Schema and view Schema. Logical Schema – It describes the database designed at logical level. Physical Schema – It describes the database designed at physical level. View Schema – It defines the design of the database at the view level.


2 Answers

This is a data model known as EAV for entity-attribute-value. It has its uses. A prime example is patient test data which is naturally sparse since there are hundreds of thousands of tests which might be run, but typically only a handful are present for a patient. A table with hundreds of thousands of columns is silly, but a table with EAV makes good sense.

like image 193
wallyk Avatar answered Sep 21 '22 06:09

wallyk


Most of the really big web sites end up using some sort of incredibly simple on the database side of things. This has the advantage that it's fast and scalable. It has the disadvantage that all the relationships that you'd get the database to enforce automatically (via triggers and such) you need to enforce yourself in your client code instead. Maintaining consistency is a pain in the neck, and there's almost always at least some chance that your data will be inconsistent, at least for short periods of time.

For a social networking site, it's a worthwhile compromise. Data that's mostly right most of the time is adequate (e.g., who really cares if the number of up-votes you receive for an item is really 20 milliseconds out of date when it's sent), and keeping costs reasonable while scaling to support a gazillion users matters a lot.

like image 24
Jerry Coffin Avatar answered Sep 19 '22 06:09

Jerry Coffin