Sometimes our project tree can have binary files, such as jpg, png, doc, xls, or pdf. Can GIT, Mercurial, SVN, or other tools do a good job when only part of a binary file is changed?
For example, if the spec is written in .doc and it is part of the repository, then if it is 4MB, and edited 100 times but just for 1 or 2 lines, and checked in 100 times during the year, then it is 400MB.
If it is 100 different .doc and .xls files, then it is 40GB... not a size that is easy to manage.
I have tried GIT and Mercurial and see that they both seem to add a big size of data even when 1 line is changed in a .doc or .pdf. Is there other way inside of GIT or Mercurial or SVN that can do the job?
P.S. I tried Dropbox and I could have a 7MB file, and then I highlight a couple of places in the .PDF file, and Dropbox seemed to be able to upload the change in 1 second. My uplink is only about 200kb/s, so I think Dropbox did a pretty good job diff'ing my file. So we can use Dropbox, except there is no version control this way.
What are the advantages of SVN? SVN has one central repository – which makes it easier for managers to have more of a top down approach to control, security, permissions, mirrors and dumps. Additionally, many say SVN is easier to use than Git. For example, it is easier to create a new feature.
While SVN is no longer the most used VCS, it has managed to establish itself in a few very niche areas. Features like customizable access control to project files and a central server are some reasons why developers may still be using SVN.
In general, version control systems work better with text files. The whole merge/conflict concept is really based around source code. However, SVN works pretty well for binary files. (We use it to version CAD drawings.)
I will point out that the file locking (svn:needs-lock) are pretty much a must-have when there are multiple people working on a common binary file. Without file locking, it is possible for 2 people to work on a binary file at once. Someone commits their changes first. Guess what happens to the person that didn't commit. All of that binary/unmergable work they did is effectively lost. File-locking serializes work on the file. You do lose the "concurrent" access capabilities of a version control system, but you still have the benefits of a commit log, rolling back to a previous version, etc.
The TortoieSVN client is smart enough to use MS Word's built in merge tool to diff a doc/docx file. It also has configuration options to let you specify alternate diff tools based on file extension, which is pretty cool. (It's a shame no one has made a diff tool for our CAD package).
Current-generation DVCSes like Git or Hg tend to suck with binary files though. They don't have any sort of mechanism for file locking.
There exist binary diff tools, however they don't help much, since the change in one pixel of an image, or a change of one character in a Word document, does not correspond to change of one byte in the file, due to compression. Thus "nice" handling of such binary data is impossible.
If you want to commit such documents, consider committing uncompressed variants - RTF instead of DOC, TeX instead of PDF, etc. If the version control system employs compression to compress its internal repository, then this method should work rather well. For instance, in Git,
Newly added objects are stored in their entirety using zlib compression.
EDIT: I just wanted to note that even RTF is horrible, but not as horrible as DOC. If you can switch to TXT or TeX for your documents, that would be best.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With