I'd like to convert the output of diff
(on a Markdown file) to
Markdown with <strike>
and <em>
tags, so that I can see what has
been removed from or added to a new version of a document. (This kind of
treatment is very common for legal documents.)
Example of hoped-for output:
Why do we
Westudy programming languages?notNot in order to ...
One of the many difficulties is that diff's output is line-oriented, where I want to see differences in individual words. Does anyone have suggestions as to what algorithm to use, or what software to build on?
Use wdiff. It already does the word-by-word comparison you're looking for; converting its output to markdown should take just a few simple regular expressions.
For example:
$ cat foo
Why do we study programming languages? Not in order to
$ cat bar
We study programming languages not in order to
$ wdiff foo bar
[-Why do we-]{+We+} study programming [-languages? Not-] {+languages not+} in order to
$ wdiff foo bar | sed 's|\[-|<em>|g;s|-]|</em>|g;s|{+|<strike>|g;s|+}|</strike>|g'
<em>Why do we</em><strike>We</strike> study programming <em>languages? Not</em> <strike>languages not</strike> in order to
Edit: Actually, wdiff has some options that make it even easier:
$ wdiff -w '<em>' -x '</em>' -y '<strike>' -z '</strike>' foo bar
<em>Why do we</em><strike>We</strike> study programming <em>languages? Not</em> <strike>languages not</strike> in order to
Use Markdown-Diff to have the word diff annotated to your original document. It formats wdiff
or git --word-diff
's output in Markdown, so you can use your favorite Markdown previewer or compiler to review changes. (Markdown-Diff was written by myself, inspired by Adam Rosenfield's answer.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With