Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What happens to orphaned commits?

I have a repo with four commits:

$ git log --oneline --decorate
6c35831 (HEAD, master) C4
974073b C3
e27b22c C2
9f2d694 C1

I reset -- soft to the C2 commit and now I have a repo like so:

$ git reset e27b22c --soft

$ git log --oneline --decorate
e27b22c (HEAD, master) C2
9f2d694 C1

Now I add an extra commit, so the log looks like this:

$ git log --oneline --decorate
545fa99 (HEAD, master) C5
e27b22c C2
9f2d694 C1

What happened to commits C3 and C4? I haven't deleted them, so I assume they are still there, C3's parent is still C2.

like image 247
BanksySan Avatar asked May 06 '15 22:05

BanksySan


3 Answers

Short answer: Commits C3 and C4 will remain in the Git object database until they are garbage collected.

Long answer: Garbage collection will occur automatically by different Git porcelain commands or when explicitly garbage collected. There are many scenarios that could trigger an automatic garbage collection; take a look at the gc.* configuration settings to get an idea. You can explicitly gabage collect using the git gc builtin command. Let's look at an example to see what happens.

First, let's set up our environment (I am using Linux; make changes as necessary for your environment) so we hopefully get the same object hashes in different Git repositories.

export GIT_AUTHOR_NAME='Wile E. Coyote'
export [email protected]
export GIT_AUTHOR_DATE=2015-01-01T12:00:00
export GIT_COMMITTER_NAME='Roadrunner'
export [email protected]
export GIT_COMMITTER_DATE=2015-01-01T12:00:00

Since commit object hashes are generated using this information, if we use the same author and committer values, we should all now get the same hashes.

Now let's initialize a function to log object information using git log, git reflog, git count-objects, git rev-list and git fsck.

function git_log_objects () {
    echo 'Log ...'
    git log --oneline --decorate
    echo 'Reflog ...'
    git reflog show --all
    echo 'Count ...'
    git count-objects -v
    echo 'Hashes ...'
    # See: https://stackoverflow.com/a/7350019/649852
    {
        git rev-list --objects --all --reflog
        git rev-list --objects -g --no-walk --all
        git rev-list --objects --no-walk $(
            git fsck --unreachable 2>/dev/null \
                | grep '^unreachable commit' \
                | cut -d' ' -f3
        )
    } | sort | uniq
}

Now let's initialize a Git repository.

git --version
git init
git_log_objects

Which, for me, outputs:

git version 2.4.0
Initialized empty Git repository in /tmp/test/.git/
Log ...
fatal: bad default revision 'HEAD'
Reflog ...
fatal: bad default revision 'HEAD'
Count ...
count: 0
size: 0
in-pack: 0
packs: 0
size-pack: 0
prune-packable: 0
garbage: 0
size-garbage: 0
Hashes ...

As expected, we have an initialized repository with no objects in it. Let's make some commits and take a look at the objects.

git commit --allow-empty -m C1
git commit --allow-empty -m C2
git tag T1
git commit --allow-empty -m C3
git commit --allow-empty -m C4
git commit --allow-empty -m C5
git_log_objects

Which gives me the following output:

[master (root-commit) c11e156] C1
 Author: Wile E. Coyote <[email protected]>
[master 10bfa58] C2
 Author: Wile E. Coyote <[email protected]>
[master 8aa22b5] C3
 Author: Wile E. Coyote <[email protected]>
[master 1abb34f] C4
 Author: Wile E. Coyote <[email protected]>
[master d1efc10] C5
 Author: Wile E. Coyote <[email protected]>
Log ...
d1efc10 (HEAD -> master) C5
1abb34f C4
8aa22b5 C3
10bfa58 (tag: T1) C2
c11e156 C1
Reflog ...
d1efc10 refs/heads/master@{0}: commit: C5
1abb34f refs/heads/master@{1}: commit: C4
8aa22b5 refs/heads/master@{2}: commit: C3
10bfa58 refs/heads/master@{3}: commit: C2
c11e156 refs/heads/master@{4}: commit (initial): C1
Count ...
count: 6
size: 24
in-pack: 0
packs: 0
size-pack: 0
prune-packable: 0
garbage: 0
size-garbage: 0
Hashes ...
10bfa58a7bcbadfc6c9af616da89e4139c15fbb9
1abb34f82523039920fc629a68d3f82bc79acbd0
4b825dc642cb6eb9a060e54bf8d69288fbee4904 
8aa22b5f0fed338dd13c16537c1c54b3496e3224
c11e1562835fe1e9c25bf293279bff0cf778b6e0
d1efc109115b00bac9d4e3d374a05a3df9754551

Now we have six objects in the repository: five commits and one empty tree. We can see Git has branch, tag and/or reflog references to all five commit objects. As long as Git references an object, that object will not be garbage collected. Explicitly running a gabage collection will result in no objects being removed from the repository. (I'll leave verifying this as an exercise for you to complete.)

Now let's remove Git references to the C3, C4 and C5 commits.

git reset --soft T1
git reflog expire --expire=all --all
git_log_objects

Which outputs:

Log ...
10bfa58 (HEAD -> master, tag: T1) C2
c11e156 C1
Reflog ...
Count ...
count: 6
size: 24
in-pack: 0
packs: 0
size-pack: 0
prune-packable: 0
garbage: 0
size-garbage: 0
Hashes ...
10bfa58a7bcbadfc6c9af616da89e4139c15fbb9
1abb34f82523039920fc629a68d3f82bc79acbd0
4b825dc642cb6eb9a060e54bf8d69288fbee4904 
8aa22b5f0fed338dd13c16537c1c54b3496e3224
c11e1562835fe1e9c25bf293279bff0cf778b6e0
d1efc109115b00bac9d4e3d374a05a3df9754551

Now we see only two commits are being referenced by Git. However, all six objects are still in the repository. They will remain in the repository until they are automatically or explicitly garbage collected. You could even, for example, revive an unreferenced commit with git cherry-pick or look at it with git show. For now though, let's explicitly garbage collect the unreferenced objects and see what Git does behind the scenes.

GIT_TRACE=1 git gc --aggressive --prune=now

This will output a bit of information.

11:03:03.123194 git.c:348               trace: built-in: git 'gc' '--aggressive' '--prune=now'
11:03:03.123625 run-command.c:347       trace: run_command: 'pack-refs' '--all' '--prune'
11:03:03.124038 exec_cmd.c:129          trace: exec: 'git' 'pack-refs' '--all' '--prune'
11:03:03.126895 git.c:348               trace: built-in: git 'pack-refs' '--all' '--prune'
11:03:03.128298 run-command.c:347       trace: run_command: 'reflog' 'expire' '--all'
11:03:03.128635 exec_cmd.c:129          trace: exec: 'git' 'reflog' 'expire' '--all'
11:03:03.131322 git.c:348               trace: built-in: git 'reflog' 'expire' '--all'
11:03:03.133179 run-command.c:347       trace: run_command: 'repack' '-d' '-l' '-f' '--depth=250' '--window=250' '-a'
11:03:03.133522 exec_cmd.c:129          trace: exec: 'git' 'repack' '-d' '-l' '-f' '--depth=250' '--window=250' '-a'
11:03:03.136915 git.c:348               trace: built-in: git 'repack' '-d' '-l' '-f' '--depth=250' '--window=250' '-a'
11:03:03.137179 run-command.c:347       trace: run_command: 'pack-objects' '--keep-true-parents' '--honor-pack-keep' '--non-empty' '--all' '--reflog' '--indexed-objects' '--window=250' '--depth=250' '--no-reuse-delta' '--local' '--delta-base-offset' '.git/objects/pack/.tmp-8973-pack'
11:03:03.137686 exec_cmd.c:129          trace: exec: 'git' 'pack-objects' '--keep-true-parents' '--honor-pack-keep' '--non-empty' '--all' '--reflog' '--indexed-objects' '--window=250' '--depth=250' '--no-reuse-delta' '--local' '--delta-base-offset' '.git/objects/pack/.tmp-8973-pack'
11:03:03.140367 git.c:348               trace: built-in: git 'pack-objects' '--keep-true-parents' '--honor-pack-keep' '--non-empty' '--all' '--reflog' '--indexed-objects' '--window=250' '--depth=250' '--no-reuse-delta' '--local' '--delta-base-offset' '.git/objects/pack/.tmp-8973-pack'
Counting objects: 3, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), done.
Total 3 (delta 1), reused 0 (delta 0)
11:03:03.153843 run-command.c:347       trace: run_command: 'prune' '--expire' 'now'
11:03:03.154255 exec_cmd.c:129          trace: exec: 'git' 'prune' '--expire' 'now'
11:03:03.156744 git.c:348               trace: built-in: git 'prune' '--expire' 'now'
11:03:03.159210 run-command.c:347       trace: run_command: 'rerere' 'gc'
11:03:03.159527 exec_cmd.c:129          trace: exec: 'git' 'rerere' 'gc'
11:03:03.161807 git.c:348               trace: built-in: git 'rerere' 'gc'

And finally, let's look at the objects.

git_log_objects

Which outputs:

Log ...
10bfa58 (HEAD -> master, tag: T1) C2
c11e156 C1
Reflog ...
Count ...
count: 0
size: 0
in-pack: 3
packs: 1
size-pack: 1
prune-packable: 0
garbage: 0
size-garbage: 0
Hashes ...
10bfa58a7bcbadfc6c9af616da89e4139c15fbb9
4b825dc642cb6eb9a060e54bf8d69288fbee4904 
c11e1562835fe1e9c25bf293279bff0cf778b6e0

Now we see we only have three objects: the two commits and one empty tree.

like image 109
Dan Cruz Avatar answered Nov 09 '22 10:11

Dan Cruz


Run git show 6c35831 to see that C4, for instance, is still there. Run git reflog master to see (lots of) what master used to reference. One of the entries (master^{1} mostly likely, but perhaps one older if you have made other changes as well) should correspond to 6c35831, and git show master^{1} (or whichever entry it is) should show the same output of the first git show command I mentioned.

like image 21
chepner Avatar answered Nov 09 '22 09:11

chepner


Orphaned commits just stay there until they are garbage collected by explicitly running git gc.

like image 4
Mureinik Avatar answered Nov 09 '22 10:11

Mureinik