As far as I know, Git's blob has SHA1 hash as file name, in order not to duplicate the file in the repository.
For example, if file A has a content of "abc" and has a SHA1 hash as "12345", as long as the content doesn't change, the commits/branches can point to the same SHA1.
But, what would happen if file A is modified to "def" to have SHA hash "23456"? Does Git store file A, and modified file A (not the difference only, but the whole file)?
The following from 'Git Community Book' answers most of my questions.
It is important to note that this is very different from most SCM systems that you may be familiar with. Subversion, CVS, Perforce, Mercurial and the like all use Delta Storage systems - they store the differences between one commit and the next. Git does not do this - it stores a snapshot of what all the files in your project look like in this tree structure each time you commit. This is a very important concept to understand when using Git.
A Git blob (binary large object) is the object type used to store the contents of each file in a repository. The file's SHA-1 hash is computed and stored in the blob object. These endpoints allow you to read and write blob objects to your Git database on GitHub.
You can run the git diff HEAD command to compare the both staged and unstaged changes with your last commit. You can also run the git diff <branch_name1> <branch_name2> command to compare the changes from the first branch with changes from the second branch. Order does matter when you're comparing branches.
Git places only four types of objects in the object store: the blobs, trees, commits, and tags. These four atomic objects form the foundation of Git's higher level data structures. Each version of a file is represented as a blob.
The git status command displays the state of the working directory and the staging area. It lets you see which changes have been staged, which haven't, and which files aren't being tracked by Git. Status output does not show you any information regarding the committed project history.
git stores files by content rather than diffs so in your example, both versions of A ("abc" and "def") would be stored in the object database.
It works out better to store whole objects because it is very easy to see if two versions of the file are the same or not just by comparing their SHAs. Have a look at the git-book for details on how the objects are stored. This works out better because if files were tracked with diffs you would need the entire history of a file to reconstruct it. Easy to do in a centralised system, but not in a distributed system where there can be many different changes to a file.
Git performs the diff directly from the objects.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With