I have already decided on using the Horde Text_Diff engine in a LAMP stack for calculating diff's and rendering them. My question is this:
What would be a good way of actually storing the incrementals in a database? I've never had to design this kind of database application before, and it appears that most engines want a fully serialized copy of the entire original and changed text in order to render the differences.
If that's the case, then how can I store the data of the diff in a database without storing the entire new document?
(NOTE: For this particular purpose, it will always be current version->proposed diff->new current version, meaning that I'm trying to store an actual diff instead of a reverse diff.)
For Wiki applications, consider storing:
StoredEdition[X] = diff(Edition[X+1], Edition[X])
, where Edition[0]
is the oldest. E.g. in a table "articles_revisions", with each row having a timestamp and referring to articleID. Sorry, at this moment I don't have a suggestion for tools to reconstitute text from serial diffs or reverse-diffs.
I think you should be able to work with the patch
utility. It creates the difference between two texts (or files) in form of the changes only. That created patch can then be stored inside the database. You still need the original text and then all patches up to the latest revision.
For PHP the xdiff Extension can be used for creating diffs for text and files.
To store the diffs inside the database you need to preserve the order of diffs, the diffs contents and the original text.
I assume you are already storing the original text. The diffs then can be stored into a diffs table containing a reference to the original text and and auto-increment key to preserve the order next to the text-contents of the diffs. You then need to insert one diff after the other in the correct order and should be fine.
To recreate the current version, query the original version and all diffs ordered. Then apply one diff after the other to get the version you like to get.
Alternatively you can create another table that contains a specific revisions result as well so to prevent to run lot of cycles over and over again. But then this will make the data inside the database redundant.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With