Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I diff and patch/merge strings instead of files?

I'm working on a project where people are able to submit stories and have other people contribute. Rather than simply editing an entry in the database, I would like to store the changes people make rather than the entire new set of changes. Then I can dynamically apply diffs if people want to revert to a previous version. I can also easily present users that are Editors with only the modified text so that they can jump right to the changes.

I am aware of how to take diff files and patch other files with them. But I'm making a web app with Python and Django, and I'll be storing all of these diffs in a MySQL database. Given that performance isn't a major issue for this app, I am prepared to pull the data from the DB, make files, and run git diff and patch on those files.

Is there a better way than building new files and deleting them every time I want to create a new version or apply a new diff? Is there some way to run diffs on straight text instead of files? Eg. setting variables in bash to be the contents of (what would be) a file (but is actually data from the DB), and running git diff on them? I would like to be controlling these actions from a Python file after the user submits a form.

I'm really just looking for a good way to get started on this problem, so any help would be greatly appreciated.

Thanks for your time,

ParagonRG

like image 355
Paragon Avatar asked May 04 '12 15:05

Paragon


1 Answers

I have done quite a bit of searching for a solution for this. Python's difflib is fairly legit, but unfortunately it tends to require that the diff strings contain the entire original strings with records of what was changed. This differs from, say, a git diff, where you only see what was changed and some extra context. difflib also provides a function called unified_diff which does indeed provide a shorter diff, but it doesn't provide a function for rebuilding a string from a string and a diff. Eg. if I made a diff out of text1 and text2, called diff1, then I couldn't generate text2 out of text1 and diff1.

I have therefore made a simple Python module that allows for strings to be rebuilt, both forwards and backwards, from a single string and its related diffs. It's called merge_in_memory, and can be found at https://github.com/danielmoniz/merge_in_memory. Simply pull the repository and run the setup.py.

A simple example of its usage:

import merge_in_memory as mim_module

str1 = """line 1
line 2"""
str2 = """line 1
line 2 changed"""

merger = mim_module.Merger()
print merger.diff_make(str1, str2)

This will output:

--- 
+++ 
@@ -1,2 +1,2 @@
 line 1
-line 2
+line 2 changed

diffs are simply strings (rather tan generators, as they are when using difflib).You can create a number of diffs and apply them at once (ie. fast-forward through a history or track back) with the diff_apply_bulk() function.

To reverse into the history, simply ensure that the reverse attribute is set to True when calling either diff_bulk() or diff_apply_bulk. For example:

merge = self.inline_merge.diff_apply_bulk(text3, [diff1, diff2], reverse=True)

If you start with text1 and generated text2 and text3 with diff1 and diff2, then text1 is rebuilt with the above line of code. Note that the list of diffs are still in ascending order. A 'merge', ie. applying a diff to a string, is itself a string.

All of this allows me to store diffs in the database as simple VARCHARs (or what-have-you). I can pull them out in order and apply them in either direction to generate the text I want, as long as I have a starting point.

Please feel free to leave any comments about this, as it is my first Python module.

Thanks,

ParagonRG

like image 112
Paragon Avatar answered Sep 19 '22 14:09

Paragon