Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Subversion diff for zipped xml file

Tags:

diff

svn

zip

meld

I'm using MySQL Workbench to maintain the database schema for an application. The .mwb file that Workbench uses, which is a zipped XML document, is kept in a Subversion repository.

The file is treated as binary data by Subversion, so I cannot use svn diff to show the changes, for example before committing.

Since the data is really XML, I'm thinking there might be some way to show the diff anyway, maybe some script that unzips the file before, or some plugin to svn diff.

The ideal solution would allow this:

$ svn diff db-model.mwb

or even using Meld:

$ meld db-model.mwb

What approach can you think of to accomplish this? Maybe someone else has had this problem of showing diff's for archived text files in Subversion.

like image 875
Oskar Avatar asked Sep 01 '09 07:09

Oskar


2 Answers

Subversion allows you to use external differencing tools . What you can do is write a wrapper script, and tell Subversion to use it as its "diff" command. Your wrapper would parse the arguments it gets from Subversion to pick out the "left" and "right" filenames, operate on them, and return an error code that Subversion will interpret as success or failure. In your case, the wrapper could unzip the XML files, and pass the unzipped results to "diff" or another tool of your choice.

Subversion will balk at diff-ing files that were detected as "binary" when they were checked in. The "--force" option lets you override this check, so your wrapper script will be be run even if the input files are checked in as binaries.

like image 175
Jim Lewis Avatar answered Nov 18 '22 16:11

Jim Lewis


I've written a diff script for workbench files which can be integrated with TortoiseSVN and TortoiseGit, which will exactly do what Jim Lewis suggests: Extract the actual XML from the archive and diff it.

The script will also eliminate all the ptr-Attribute noise in the diff. Merging is not possible and would be a bit more complicated (discover how the ptr-attributes would behave, re-pack the XML into the archive, what's with the other metadata in the archive?, ...)

The python script is available at pastebin under CC-BY 3.0:

http://pastebin.com/AcD7dBNH

# extensions: mwb
# TortoiseSVN Diff script for MySQL Workbench scheme files
# 2012 by Oliver Iking, Z-Software GmbH, oliverikingREPLACETHISWITHANATz-software.net, http://www.z-software.net/
# This work is licensed under a Creative Commons Attribution 3.0 Unported License - http://creativecommons.org/licenses/by/3.0/

# Will produce two diffable documents, which don't resemble the FULL MWB content, but the scheme relevant data. 
# Merging is not possible

# Open your TortoiseSVN (or TortoiseSomething) settings, go to the "Diff Viewer" tab and click on "Advanced". Add 
# a row with the extension ".mwb" and a command line of 
# "path\to\python.exe" "path\to\diff-mwb.py" %base %mine
# Apply changes and now you can diff mysql workbench scheme files

import sys
import zipfile
import os
import time
import tempfile
import re

# mysql workbench XML will have _ptr_ attributes which are modified on each save for almost each XML node. Remove the visual litter, 
# make actual changes stand out.
def sanitizeMwbXml( xml ):
    return re.sub('_ptr_="([0-9a-fA-F]{8})"', '', xml)

try:
    if len(sys.argv) < 2:
        print("Not enough parameters, cannot diff documents!")
        sys.exit(1)

    docOld = sys.argv[1]
    docNew = sys.argv[2]

    if not os.path.exists(docOld) or not os.path.exists(docNew):
        print("Documents don't exist, cannot diff!")
        sys.exit(1)

    # Workbench files are actually zip archives
    zipA = zipfile.ZipFile( docOld, 'r' )
    zipB = zipfile.ZipFile( docNew, 'r' )

    tempSubpath = os.tempnam(None,"mwbcompare")

    docA = os.path.join( tempSubpath, "mine.document.mwb.xml" )
    docB = os.path.join( tempSubpath, "theirs.document.mwb.xml" )

    os.makedirs( tempSubpath )

    if os.path.exists(docA) or os.path.exists(docB):
        print("Cannot extract documents, files exist!")
        sys.exit(1)

    # Read, sanitize and write actual scheme XML contents to temporary files

    docABytes = sanitizeMwbXml(zipA.read("document.mwb.xml" ))
    docBBytes = sanitizeMwbXml(zipB.read("document.mwb.xml" ))

    docAFile = open(docA, "w")
    docBFile = open(docB, "w")

    docAFile.write(docABytes)
    docBFile.write(docBBytes)

    docAFile.close()
    docBFile.close()

    os.system("TortoiseProc /command:diff /path:\"" + docA + "\" /path2:\"" + docB + "\"");

    # TortoiseProc will spawn a subprocess so we can't delete the files. They're in the tempdir, so they
    # will be cleaned up eventually
    #os.unlink(docA)
    #os.unlink(docB)

    sys.exit(0)
except Exception as e:
    print str(e)
    # Sleep, or the command window will close
    time.sleep(5)
like image 3
Oliver Avatar answered Nov 18 '22 16:11

Oliver