I have two chunks of text that I would like to compare and see which words/lines have been added/removed/modified in Python (similar to a Wiki's Diff Output).
I have tried difflib.HtmlDiff but it's output is less than pretty.
Is there a way in Python (or external library) that would generate clean looking HTML of the diff of two sets of text chunks? (not just line level, but also word/character modifications within a line)
There's diff_prettyHtml()
in the diff-match-patch library from Google.
Generally, if you want some HTML to render in a prettier way, you do it by adding CSS.
For instance, if you generate the HTML like this:
import difflib
import sys
fromfile = "xxx"
tofile = "zzz"
fromlines = open(fromfile, 'U').readlines()
tolines = open(tofile, 'U').readlines()
diff = difflib.HtmlDiff().make_file(fromlines,tolines,fromfile,tofile)
sys.stdout.writelines(diff)
then you get green backgrounds on added lines, yellow on changed lines and red on deleted. If I were doing this I would take take the generated HTML, extract the body, and prefix it with my own handwritten block of HTML with lots of CSS to make it look good. I'd also probably strip out the legend table and move it to the top or put it in a div so that CSS can do that.
Actually, I would give serious consideration to just fixing up the difflib module (which is written in python) to generate better HTML and contribute it back to the project. If you have a CSS expert to help you or are one yourself, please consider doing this.
I recently posted a python script that does just this: diff2HtmlCompare (follow the link for a screenshot). Under the hood it wraps difflib and uses pygments for syntax highlighting.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With