Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Version-controlling zipped files (docx, odt)

Tags:

There are formats that are actually zip files in disguise, e.g. docx or odt. If I store them directly in version control, they are handled as binary files. My ideal solution would be

  • have a hook that creates a foo.docx/ directory for each foo.docx files before commit, unzipping all files into it
  • optionally, have a hook that reindents the xml files
  • have a hook that recreates foo.docx from the stored files after update

I don't want the docx files themselves to be version-controlled. (I am aware of a related question where a different approach with a custom diff was suggested.)

Is this doable? Is this doable with mercurial?

UPDATE:

I know about hooks. I am interested in the specifics. Here is a session to demonstrate the expected behavior.

> hg add foo.docx
> hg status
A foo.docx
> hg commit
> # Change foo.docx with external editor
> hg status
M foo.docx
> hg diff
+++ foo.docx/word/document.xml
- <w:t>An idea</w:t>
+ <w:t>A much better idea</w:t>
like image 709
Adam Schmideg Avatar asked Sep 21 '10 22:09

Adam Schmideg


People also ask

Is DOCX a ZIP file?

A Docx file comprises of a collection of XML files that are contained inside a ZIP archive. The contents of a new Word document can be viewed by unzipping its contents. The collection contains a list of XML files that are categorized as: MetaData Files - contains information about other files available in the archive.

Can Git handle ZIP files?

The very latest MSysGit (aka Git for Windows) now has both zip and unzip on the shell code side, so you can use them in aliases.

Which module can help extract all of the files from a ZIP file?

extractall() method will extract all the contents of the zip file to the current working directory. You can also call extract() method to extract any file by specifying its path in the zip file.


2 Answers

I was wondering the same thing, and just came across the ZipDoc extension/filter for Mercurial, which seems to do exactly this!

Haven't tried it yet, but it looks promising!

like image 138
Danny Tuppeny Avatar answered Sep 19 '22 19:09

Danny Tuppeny


If you can get past the hurdle of succesfully unzipping and zipping the Openoffice documents, then you should be able to use the filter system we have in Mercurial. That lets you transform files on every read/write from/to the repository.

You will unfortunately have to do more than just unzip the foo.docx file. The problem is that you need to generate a single file as output -- so perhaps you can unzip foo.docx and then tar up the generated files. You'll then be versioning the tarball, which should work since a tarball is just an uncompressed concatenations of all the individual files with some meta information. Come to think of it, a simpler solution would be to zip the unpacked foo.docx file again but specify no compression. That should give similar results as using tar.

Solving this problem is something I've wanted to do myself, so please report back by sending a mail to Mercurial mailing list.

like image 34
Martin Geisler Avatar answered Sep 22 '22 19:09

Martin Geisler