Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Implementing Version Control of DB Objects

I will soon be beginning work on a project that (from the spec) reminds me a bit of StackOverflow. Basically, its a web app that has user-controlled content.

One of the features that's got me going around in circles in my mind is the version control. Here on StackOverflow, each question and answer can have multiple revisions. This is pretty simple to implement when you have only one type of object (and, in this case, its text).

So, for my simple pages, I'm set.

The problem comes in when I consider that some objects that need to be under version control have relationships. To provide a concrete example, let me choose a random analagous domain:

Lets say I was implementing a Wiki-like site for keeping track of book/author info. The primary focus of the site would be to create and update "Author" pages, which, as text, is pretty simple (as above). However, let us add a one-to-many association between authors and books (in other words, books would be separate objects, as obviously a person could author many books). Each book would have a link from the Author page to an informational page about that book.

To the user, there is little difference between the text-based "summary" describing the author and the links between that author & their works. Thus, we have a requirement to implement the "revision"/edit feature for author pages, book pages, and the association between authors and books. In other words, the user should be able to edit, view history of, and rollback author pages, book pages, and associations between the two.

This becomes even more complicated when that relationship becomes a many-to-many, where multiple authors could be listed as having contributed to a book.

I have a number of solutions in mind, but none of them are as clean as I'd like (and involve at least some repeated code/redundant data storage), and, although I do see commonality all over the place here, I feel that I haven't really been able to extract it best, especially at the database level. I don't want to bias the answers given so I'm not going to give them right away.

So, how would you design this system at the database level? I'm looking for table specifications here, and possibly a description of how you'd use them, if its not immediately obvious. For those answers to which it may be relevant, I'm going to be using ASP.NET and either Linq-to-SQL (I'm comfortable with many-to-many in LTS) or Entity Framework.

EDIT: To clarify, I understand basic DB design, normalization, many-to-many mapping tables, etc. I'm looking for a clean solution to this specific situation.

EDIT 2: I'm looking for a generalizable solution, as there may be way more sub-objects in the system than just books. The author may be related to other authors, magazines, events, etc, etc, etc. I feel like I'm repeating a lot of work if I implement history individually for each one.

like image 413
JoshJordan Avatar asked Aug 16 '09 22:08

JoshJordan


2 Answers

This is a common-enough problem in data warehousing. They use "slowly-changing dimensions".

There have to be some rules, however, if you're going to try and have "versioned" data.

  1. You must record the Author-Book relationship as initially defined. This the the official Author-Book relationship. It is something the data warehouse folks call a "fact-less fact table". It's pairs of keys.

  2. Books are a dimension of the book-author fact. The book can change. There are numerous slowly-changing dimension algorithms. You can keep only the latest, have history table separate from current. Keep history and current in one table with a flag to distinguish current from history.

  3. Authors are a dimension of the book-author fact. The author can change. Again, there are numerous SCD algorithms. Read on up on the choices. By Ralph Kimball's The Data Warehouse Toolkit for more information.

Note that the relationship (author to book) is a fact, and does not need versions. It's a fact. It doesn't "change". It's either true or it was put in the database in error -- in which case, it must be removed. Facts don't need version numbers.

In a more sophisticated star schema, your facts have measures. Price, volume sold, cost, profit, etc. These are also recorded in the fact table. These pieces of information may vary with time. Therefore, you almost always have a time dimension for each fact.

Therefore, Time is a dimension of the book-author fact. If this fact can change, the applicable time period is recorded as part of the fact.

The time dimension isn't quite the same thing as a version number. It's slightly simpler. It states that at a given point in time, the fact was true. If the fact changes, you append a new fact with a different timestamp.

You can, given a particular point in time, locate the relevant facts and associated dimension values.

like image 113
S.Lott Avatar answered Sep 23 '22 17:09

S.Lott


I have a table for each table: i.e., Author and Book.

There's the usual foreign key relationship (whatever that is) between the tables.

Each table also has a history table, i.e. AuthorHistory and BookHistory. These history tables contain the old/obsolete versions of records (e.g. each deleted and/or edited Author record). There's no foreign key relationship to/from the history tables.


Edit:

Some functionality is similar for each table: for example, no matter which table, updating a record means storing the old copy of the record in te corresponding history table. I implement this functionality using database triggers (update and delete triggers for each table); because the database engine that I'm using supports triggers, and that makes it transparent to the application. The code within these triggers is similar from one table to the next (only the names of the table, and the list of field names, differs from one table to the next).


What about the many-to-many situation? This is more difficult because you could actually have no record mapping an author to a book, but have previously had one and need to show that as a history item

Edit#2:

I haven't implemented history of a many-to-many situation yet, but I don't see why it wouldn't be the same, i.e.:

  • The many-to-many relationship is implemented by there being a BookAuthor table, each of whose records is just BookId plus AuthorId.
  • Historical relationships are in a corresponding BookAuthorHistory table.
like image 33
ChrisW Avatar answered Sep 22 '22 17:09

ChrisW