Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Do different versions of a file get their own blob/sha?

Tags:

git

If I read correctly git stores all it's files in blobs. If you modify a file does the modified version of the file get it's own blob and therefor it's own sha?

like image 661
Pickels Avatar asked May 08 '11 16:05

Pickels


2 Answers

That's correct - if the file's content changes even by a single bit, it will have a new object name (a.k.a. the SHA1sum or hash). You can see the object name that the file would have with git hash-object, if you want to test that:

 $ git hash-object text.txt
 9dbcaae0abd0d45c30bbb1a77410fb31aedda806

You can find out more about how the hashes for blobs are calculated here:

  • Why does git hash-object return a different hash than openssl sha1?
like image 123
Mark Longair Avatar answered Nov 16 '22 03:11

Mark Longair


I would like to add to Mark's answer.

While Subversion, CVS, and even Mercurial use Delta Storage - whereby they only store the difference between commits, Git takes a snapshot of the tree with each commit.

When a file content changes, a new blob is added for the content to the object store. Git only cares about the content at this point and not the filename. The filename and path are tracked through tree objects. When a file changes and is added to the index, the blobs for the content are created. When you commit ( or use low-level commands like git write-tree) the tree object is updated to make the file point to the new content. It is also to be noted that while every change to a file creates a new blob for it, but files with same content will never get different blobs.

So, your question

If you modify a file does the modified version of the file get it's own blob and therefor it's own sha?

The new content gets a new blob and the file is pointed to the new blob. And also, if the new content is same as some previous blob, it is just pointed to the old one.

PS: It is to be noted that Git "packs" these "loose objects" into pack files ( where git stores deltas from one version of the file to the other) when there are too many loose objects around, if git gc is run manually, or when pushing to a remote server, so it can be the case that files are stored in delta. Look at the Pro-Git chapter on this for more info - http://progit.org/book/ch9-4.html

like image 41
manojlds Avatar answered Nov 16 '22 01:11

manojlds