Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does every commit creates a new tree object in git?

Tags:

git

I am learning about git internals and how git object model works "under the hood".

If I change some file and commit it in local git repository, then a new git commit object will be created. Each commit object has associated tree object with it. Each tree object contains SHA1 of files (blobs) it points to. So does this mean that every new commit (assuming there is some file change in it) will always generate a new tree object (which will have different SHA1 than all previous trees even though they point to the same directory on file system)?

Is my reasoning about this correct? Also, is it possible to commit without file changes? In that case there wouldn't be need for a new tree object but I don't know if this type of commits are possible in git.

like image 942
matori82 Avatar asked Apr 20 '19 21:04

matori82


People also ask

What is commit tree in Git?

git-commit-tree is a low level command which commits a single tree object but does not perform any of the follow-up reference and Head work that git-commit does.

Which command is used for creating tree object in Git?

Creates a tree object using the current index. The name of the new tree object is printed to standard output. The index must be in a fully merged state. Conceptually, git write-tree sync()s the current index contents into a set of tree files.

What is a commit object in Git?

The commit object contains the directory tree object hash, parent commit hash, author, committer, date and message.

What does a commit tree resembles a?

While a tree represents a particular directory state of a working directory, a commit represents that state in "time", and explains how to get there.


2 Answers

Let's take things one step at a time.

Every time you add a file to your repository, usually by adding it to the index and then committing, a snapshot of the whole file is added. A hash is calculated, and this hash is the identifier for this file.

However, if you 5-6 commits down the line manage to restore a files contents back to what it was previously, its new hash will already exist in the repository and thus no additional file will be added. Instead, whatever is going to refer to this file will use the hash but thus refer to the "old" file.

Tree objects are just text files that contains the hashes of the files in the directory, as well as the hashes that identify sub-trees (sub-folders). The hash of tree objects is also calculated from the contents of the tree, and thus depends on the hashes of the files, and the hashes of sub-trees.

In other words, with that above scenario where we restored a file, if we end up restoring the contents of all the files in a repository back to the state they had in a previous commit, the hash of the new tree will already exist and no new tree object will be added. Instead, whatever is going to refer to that tree, a commit most likely, will use the hash and refer to the "old" tree.

In most cases, this is probably a bit theoretical. It is probably not a scenario you will encounter very often that you end up restoring all the files back to some older state. So in practice, every time you create a commit you will most likely also create and add one or more new tree objects as well.

To add a commit without file changes, known as an "empty commit", you can use this git command:

git commit --allow-empty

You can tack on things like -m "message" or the like as you normally would.

Here's an example:

λ git init .
Initialized empty Git repository in D:/Temp/.git/

λ echo a >test.txt                                                             
λ git add .                                                                    
λ git commit -m test1                                                          
[master (root-commit) dc613fe] test1                                           
 1 file changed, 1 insertion(+)                                                
 create mode 100644 test.txt                                                   

λ git commit -m test2 --allow-empty                                            
[master c197192] test2                                                         

λ git lg                                                                       
* c197192: (7 seconds ago) test2 (HEAD -> master)                              
| Lasse Vågsæther Karlsen <[email protected]> (Sat, 20 Apr 2019 23:28:44 +0200)
|                                                                              
* dc613fe: (17 seconds ago) test1                                              
  Lasse Vågsæther Karlsen <[email protected]> (Sat, 20 Apr 2019 23:28:34 +0200)

Now, if I output the contents of those two commits:

λ git cat-file -p c197192
tree 35b422a71005d59dd6af858a3425b608b63f7b5a
parent dc613fe57276009b399d8152a657cb971fad605a
author Lasse Vågsæther Karlsen <[email protected]> 1555795724 +0200
committer Lasse Vågsæther Karlsen <[email protected]> 1555795724 +0200

test2

λ git cat-file -p dc613fe
tree 35b422a71005d59dd6af858a3425b608b63f7b5a
author Lasse Vågsæther Karlsen <[email protected]> 1555795714 +0200
committer Lasse Vågsæther Karlsen <[email protected]> 1555795714 +0200

test1

You can see that they both refer to the exact same tree object, which looks like this:

λ git cat-file -p 35b422a71005d59dd6af858a3425b608b63f7b5a
100644 blob f5eea678d87a8664e4c76e12d3ef5c4ff775ad58    test.txt
like image 95
Lasse V. Karlsen Avatar answered Oct 10 '22 04:10

Lasse V. Karlsen


Is my reasoning about this correct?

Practically - yes - but see below

Also, is it possible to commit without file changes? In that case there wouldn't be need for a new tree object but I don't know if this type of commits are possible in git.

@Lasse already mentioned git commit --allow-empty as a way to reuse the last tree but this a quite unusual command. A quite common command is git commit --amend when you just want to fix the last commit message.

Also note: Existing trees can be reused and these trees do not need to be from the last commit. A common scenario is git rebase --interactive and just rewording the commit messages (similar to git commit --amend but for commits further away from HEAD).

Another scenario: Consider this commit sequence:

commit 0
commit A
commit B
commit C
revert C  # will reuse tree from B
revert B  # will reuse tree from A
revert A  # will reuse tree from 0

In this case old trees are reused as well.

The next scenario: git merge -s ours (not to be confused with git merge -X ours) will merge another branch but ignore any changes. In other words: the merge-commit and the first parent share the same tree.

The swiss army knife for doing strange things is - of course - git filter-branch where you can rewrite the commits in several ways but leave the trees untouched.

like image 41
A.H. Avatar answered Oct 10 '22 04:10

A.H.