Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How To Store Text Diffs in DB?

Tags:

php

mysql

diff

I have already decided on using the Horde Text_Diff engine in a LAMP stack for calculating diff's and rendering them. My question is this:

What would be a good way of actually storing the incrementals in a database? I've never had to design this kind of database application before, and it appears that most engines want a fully serialized copy of the entire original and changed text in order to render the differences.

If that's the case, then how can I store the data of the diff in a database without storing the entire new document?

(NOTE: For this particular purpose, it will always be current version->proposed diff->new current version, meaning that I'm trying to store an actual diff instead of a reverse diff.)

like image 635
JRL Avatar asked Jun 27 '11 17:06

JRL


2 Answers

For Wiki applications, consider storing:

  1. Full text of the most recent edition [to facilitate e.g. searching, rapid display], e.g in a table "articles"
  2. Older editions as reverse diffs of the most recent text. Each prior edition could be stored as StoredEdition[X] = diff(Edition[X+1], Edition[X]), where Edition[0] is the oldest. E.g. in a table "articles_revisions", with each row having a timestamp and referring to articleID.

Sorry, at this moment I don't have a suggestion for tools to reconstitute text from serial diffs or reverse-diffs.

like image 105
user359981 Avatar answered Sep 20 '22 10:09

user359981


I think you should be able to work with the patch utility. It creates the difference between two texts (or files) in form of the changes only. That created patch can then be stored inside the database. You still need the original text and then all patches up to the latest revision.

For PHP the xdiff Extension can be used for creating diffs for text and files.

Storing DIFFs in the database

To store the diffs inside the database you need to preserve the order of diffs, the diffs contents and the original text.

I assume you are already storing the original text. The diffs then can be stored into a diffs table containing a reference to the original text and and auto-increment key to preserve the order next to the text-contents of the diffs. You then need to insert one diff after the other in the correct order and should be fine.

To recreate the current version, query the original version and all diffs ordered. Then apply one diff after the other to get the version you like to get.

Alternatively you can create another table that contains a specific revisions result as well so to prevent to run lot of cycles over and over again. But then this will make the data inside the database redundant.

like image 31
hakre Avatar answered Sep 22 '22 10:09

hakre