Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the most compact way to store diffs in a database?

I want to implement something similar to Wikimedia's revision history? What would be the best PHP functions/libraries/extensions/algorithms to use?

I would like the diffs to be as compact as possible, but I'm happy to be restricted to only showing the difference between each revision and its sibling, and only being able to roll back one revision at a time.

In some cases only a few characters may change, whereas in other cases the whole string could change, so I'm keen to understand whether some techniques are better for small changes than for large ones, and if in some cases it's more efficient to simply store whole copies.

Backing the whole system with something like Git or SVN seems a bit extreme, and I don't really want to store files on disk.

like image 611
Tim Avatar asked Feb 09 '12 19:02

Tim


2 Answers

It is much easier to store each record in its entirety than it is to store diffs of them. Then if you want a diff of two revisions you can generate one as needed using the PECL Text_Diff library.

I like to store all versions of the record in a single table and retrieve the most recent one with MAX(revision), a "current" boolean attribute, or similar. Others prefer to denormalize and have a mirror table that holds non-current revisions.

If you store diffs instead, your schema and algorithms become much more complex. You then need to store at least one "full" revision and multiple "diff" versions, and reconstruct a full version from a set of diffs whenever you need a full version. (This is how SVN stores things. Git stores a full copy of each revision, not diffs.)

Programmer time is expensive, but disk space is usually cheap. Please consider whether storing each revision in full is really a problem.

like image 74
Francis Avila Avatar answered Nov 14 '22 22:11

Francis Avila


You must ask yourself: what type of data end user will want to retrieve more often: revisions, or diffs of revisions? I would use standard diff from unix for that. And, depending on the answer of above question, store diffs or whole revisions in database.

Backing the whole system with something like Git or SVN seems a bit extreme

Why? Github, AFAIR, stores wikis that way ;)

like image 29
wikp Avatar answered Nov 14 '22 22:11

wikp