There are formats that are actually zip files in disguise, e.g. docx or odt. If I store them directly in version control, they are handled as binary files. My ideal solution would be
foo.docx/
directory for each foo.docx
files before commit, unzipping all files into itfoo.docx
from the stored files after updateI don't want the docx files themselves to be version-controlled. (I am aware of a related question where a different approach with a custom diff was suggested.)
Is this doable? Is this doable with mercurial?
UPDATE:
I know about hooks. I am interested in the specifics. Here is a session to demonstrate the expected behavior.
> hg add foo.docx
> hg status
A foo.docx
> hg commit
> # Change foo.docx with external editor
> hg status
M foo.docx
> hg diff
+++ foo.docx/word/document.xml
- <w:t>An idea</w:t>
+ <w:t>A much better idea</w:t>
A Docx file comprises of a collection of XML files that are contained inside a ZIP archive. The contents of a new Word document can be viewed by unzipping its contents. The collection contains a list of XML files that are categorized as: MetaData Files - contains information about other files available in the archive.
The very latest MSysGit (aka Git for Windows) now has both zip and unzip on the shell code side, so you can use them in aliases.
extractall() method will extract all the contents of the zip file to the current working directory. You can also call extract() method to extract any file by specifying its path in the zip file.
I was wondering the same thing, and just came across the ZipDoc extension/filter for Mercurial, which seems to do exactly this!
Haven't tried it yet, but it looks promising!
If you can get past the hurdle of succesfully unzipping and zipping the Openoffice documents, then you should be able to use the filter system we have in Mercurial. That lets you transform files on every read/write from/to the repository.
You will unfortunately have to do more than just unzip the foo.docx file. The problem is that you need to generate a single file as output -- so perhaps you can unzip foo.docx
and then tar
up the generated files. You'll then be versioning the tarball, which should work since a tarball is just an uncompressed concatenations of all the individual files with some meta information. Come to think of it, a simpler solution would be to zip the unpacked foo.docx file again but specify no compression. That should give similar results as using tar.
Solving this problem is something I've wanted to do myself, so please report back by sending a mail to Mercurial mailing list.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With