Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are the standard/recommended ways to store version-controlled, database data?

I want to store a blog post in a database. I thought it would be nice to have different versions of that data, much like version controlling does for text files.

So, I imagine it working like a row in a table, that had version control. So, for example, you could retrieve the latest version of that row, or a previous version. You could even branch from that row.

Does anything like this exist?

Possibly useful info: I am currently using Python, Django & MySQL. I'm experimenting with MongoDB

Edit for clarity/more context: I'm looking for a solution more tailored towards "version control" of rows than of databases; I'm not so much interested in branching entire databases. For example, I would be able to query the content of blog post at 1/1/2011 and at 1/1/2010 (without switching databases).

like image 260
cammil Avatar asked Dec 15 '11 15:12

cammil


People also ask

How do you store a database version?

There is no generally-accepted place to store a version number in a database schema. Databases don't have version numbers by convention. There is no standard way of putting in this or any other information about an application into the database.

Which storage is best for database files?

DB provides data integrity between the file and its metadata. Database Security is available by default. Backups automatically include files, no extra management of file system necessary. Database indexes perform better than file system trees when more number of items are to be stored.

What is the need of versioning in data storage?

In the case of research data, a new version of a dataset may be created when an existing dataset is reprocessed, corrected or appended with additional data. Versioning is one means by which to track changes associated with 'dynamic' data that is not static over time.


2 Answers

First off, I must say this is an interesting question.

In my line of work, I've to save versions of various user inputs. The way I do it, and by all means I don't really know if it's the right way or not, is the following:

I have a master table and revisions table. I chose these 2 names for the example's sake only.

What master does is stores the following info:

  • id (autoincrement)
  • version_id (int)

What revisions store is the following:

  • id
  • master_id
  • version_id
  • rest of the relevant data about entered entity (dates, etc)

What I achieved this way is that I had gotten an ID of, let's say a blog post. If someone edits the post, I'll store that info to revisions table. Via triggers I'm incrementing the version_id in revisions table. After that I update master table with the latest version_id number. That way I don't have to perform MAX() when I want to see what the latest version is.

That way I obtained simple, yet powerful version system of website's content. It's easy to see changes, and it's also extremely fast to obtain data if you abuse some MySQL cool features (in my actual tables, I'm abusing InnoDB's clustered primary key to the max. so the db design is slightly diff. than the one I posted here).

like image 114
N.B. Avatar answered Oct 07 '22 15:10

N.B.


Version control is a complicated topic; doing it right is really challenging which is essentially why even using e.g. git can be tough. I wouldn't want to write a full blown version control system.

For simpler requirements, consider this structure, in pseudo-mongodb/JSON:

BlogPost {
    "_id": ObjectId("..."),
    "slug" : "how-to-version-my-posts",
    "author" : "cammil",
    "published" : date,
    "lastModified" : date,
    "publicVersion" : 32,
    "draftVersion" : 34,
    "teaserText" : "lorem ipsum dolor sit amet..."
}

BlogPostBody {
    "_id" : ObjectId("..."),
    "Version" : 32,
    "Text" : "lorem ipsum dolor sit amet..."
}

So the idea here is to store each version separately and a pointer to the current public version and the current version for editors, bloggers, etc.

My answer is a little MongoDB centric (because I built a MongoDB based blog engine for home use), but it should work similarly for any storage system.

Advantages:

  • No need to do MAX queries of the version number for either public or private posts
  • Does not correlate last edited to version numbers, which might not be desirable
  • Allows versioning even if a certain version is already published
  • Can fetch the teaser w/o having to fetch the entire article

Disadvantages:

  • Copies the entire text every time. Not a real concern for textual data I guess (try to type 1GB...). Will be a problem for bigger blogging sites, however. Mitigation: Compress text using deflate, delta-compression.
  • Needs to update two objects on update
like image 20
mnemosyn Avatar answered Oct 07 '22 15:10

mnemosyn