Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Do version control systems use diffs to store binary files?

How do popular version control systems (svn, git) handle storing revisions to a binary document? I have projects with binary sources that are periodically updated and need to be checked in (mostly Photoshop documents, custom data format and a few word processing documents). I've always been worried about checking in the binaries because I thought that the VCS might take a simple route of simply uploading a new copy of the binary each time - and hence my repository would get huge quickly.

If I have several data blocks (let's call them A, B, C, D, etc) and I have a binary file that on first check in looks like ABC, but then on the second check in has been modified to ADBE, will my VCS be smart enough to only store the changed bits or will it create an entirely new image of the file?

like image 558
kitfox Avatar asked Feb 07 '23 03:02

kitfox


1 Answers

tl;dr

Git can store just diffs of binary files, but it's not very efficient, so you probably should use some external tools like lfs.

Slightly longer explanation

By default, git doesn't store diffs between commits. When you change some file and make a new commit, git stores object with a content of the whole file. It doesn't matter if you change just one line, or rewrite whole file - git doesn't store diffs, at least at first place. There is a piece of git called git-gc (garbage collector) responsible for tasks such removing dangling commits and optimization, it runs another git command - git-repack which does exactly what you ask for. It takes the whole bunch of objects and stores them inside one pack using delta compression.

Unfortunately packing with git-repack is not especially efficient when comes to compressing binary files. You can always tweak it, but if your files change a lot, or if they are really big, you should probably use some external tool like lfs.

like image 156
qzb Avatar answered Feb 09 '23 11:02

qzb