I was reading about how git stores changes in The Git Object Model1.
It sounds like if I change one line in a file, it's going to re-store the entire file. Does this waste a lot of space compared to say, Subversion which only stores diffs?
(Or am I misunderstanding the storage model?)
1 As of 2011 when question was asked. Current closest link is Git Internals - Git Objects.
Git is very efficient in storing text files, and only storing these files that were changed.
Git stores just the contents of the file for tracking history, and not just the differences between individual files for each change. The contents are then referenced by a 40 character SHA1 hash of the contents, which means it's pretty much guaranteed to be unique.
Git does use deltas for storage. Not only that, but it's more efficient in it than any other system.
When you commit, git stores snapshots of the entire file, it does not store diffs from the previous commit. As a repository grows, the object count grows exponentially and clearly it becomes inefficient to store the data as loose object files.
Git will eventually pack everything into delta-compressed archives during the regular course of its internal maintenance, at which point this is no longer an issue.
This isn't really an issue today though. Git's philosophy is that disk space is cheap, and it's better optimize for speed rather than storage efficiency. Chances are you'll be better served by a SCM which is twice as fast, as opposed to one which requires half the disk space.
See the Git Book's chapter on The Packfile as well as git repack and git-pack-objects.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With