I have a repo with four commits:
$ git log --oneline --decorate
6c35831 (HEAD, master) C4
974073b C3
e27b22c C2
9f2d694 C1
I reset -- soft
to the C2
commit and now I have a repo like so:
$ git reset e27b22c --soft
$ git log --oneline --decorate
e27b22c (HEAD, master) C2
9f2d694 C1
Now I add an extra commit, so the log looks like this:
$ git log --oneline --decorate
545fa99 (HEAD, master) C5
e27b22c C2
9f2d694 C1
What happened to commits C3
and C4
? I haven't deleted them, so I assume they are still there, C3
's parent is still C2
.
Short answer: Commits C3
and C4
will remain in the Git object database until they are garbage collected.
Long answer: Garbage collection will occur automatically by different Git porcelain commands or when explicitly garbage collected. There are many scenarios that could trigger an automatic garbage collection; take a look at the gc.*
configuration settings to get an idea. You can explicitly gabage collect using the git gc
builtin command. Let's look at an example to see what happens.
First, let's set up our environment (I am using Linux; make changes as necessary for your environment) so we hopefully get the same object hashes in different Git repositories.
export GIT_AUTHOR_NAME='Wile E. Coyote'
export [email protected]
export GIT_AUTHOR_DATE=2015-01-01T12:00:00
export GIT_COMMITTER_NAME='Roadrunner'
export [email protected]
export GIT_COMMITTER_DATE=2015-01-01T12:00:00
Since commit object hashes are generated using this information, if we use the same author and committer values, we should all now get the same hashes.
Now let's initialize a function to log object information using git log
, git reflog
, git count-objects
, git rev-list
and git fsck
.
function git_log_objects () {
echo 'Log ...'
git log --oneline --decorate
echo 'Reflog ...'
git reflog show --all
echo 'Count ...'
git count-objects -v
echo 'Hashes ...'
# See: https://stackoverflow.com/a/7350019/649852
{
git rev-list --objects --all --reflog
git rev-list --objects -g --no-walk --all
git rev-list --objects --no-walk $(
git fsck --unreachable 2>/dev/null \
| grep '^unreachable commit' \
| cut -d' ' -f3
)
} | sort | uniq
}
Now let's initialize a Git repository.
git --version
git init
git_log_objects
Which, for me, outputs:
git version 2.4.0
Initialized empty Git repository in /tmp/test/.git/
Log ...
fatal: bad default revision 'HEAD'
Reflog ...
fatal: bad default revision 'HEAD'
Count ...
count: 0
size: 0
in-pack: 0
packs: 0
size-pack: 0
prune-packable: 0
garbage: 0
size-garbage: 0
Hashes ...
As expected, we have an initialized repository with no objects in it. Let's make some commits and take a look at the objects.
git commit --allow-empty -m C1
git commit --allow-empty -m C2
git tag T1
git commit --allow-empty -m C3
git commit --allow-empty -m C4
git commit --allow-empty -m C5
git_log_objects
Which gives me the following output:
[master (root-commit) c11e156] C1
Author: Wile E. Coyote <[email protected]>
[master 10bfa58] C2
Author: Wile E. Coyote <[email protected]>
[master 8aa22b5] C3
Author: Wile E. Coyote <[email protected]>
[master 1abb34f] C4
Author: Wile E. Coyote <[email protected]>
[master d1efc10] C5
Author: Wile E. Coyote <[email protected]>
Log ...
d1efc10 (HEAD -> master) C5
1abb34f C4
8aa22b5 C3
10bfa58 (tag: T1) C2
c11e156 C1
Reflog ...
d1efc10 refs/heads/master@{0}: commit: C5
1abb34f refs/heads/master@{1}: commit: C4
8aa22b5 refs/heads/master@{2}: commit: C3
10bfa58 refs/heads/master@{3}: commit: C2
c11e156 refs/heads/master@{4}: commit (initial): C1
Count ...
count: 6
size: 24
in-pack: 0
packs: 0
size-pack: 0
prune-packable: 0
garbage: 0
size-garbage: 0
Hashes ...
10bfa58a7bcbadfc6c9af616da89e4139c15fbb9
1abb34f82523039920fc629a68d3f82bc79acbd0
4b825dc642cb6eb9a060e54bf8d69288fbee4904
8aa22b5f0fed338dd13c16537c1c54b3496e3224
c11e1562835fe1e9c25bf293279bff0cf778b6e0
d1efc109115b00bac9d4e3d374a05a3df9754551
Now we have six objects in the repository: five commits and one empty tree. We can see Git has branch, tag and/or reflog references to all five commit objects. As long as Git references an object, that object will not be garbage collected. Explicitly running a gabage collection will result in no objects being removed from the repository. (I'll leave verifying this as an exercise for you to complete.)
Now let's remove Git references to the C3
, C4
and C5
commits.
git reset --soft T1
git reflog expire --expire=all --all
git_log_objects
Which outputs:
Log ...
10bfa58 (HEAD -> master, tag: T1) C2
c11e156 C1
Reflog ...
Count ...
count: 6
size: 24
in-pack: 0
packs: 0
size-pack: 0
prune-packable: 0
garbage: 0
size-garbage: 0
Hashes ...
10bfa58a7bcbadfc6c9af616da89e4139c15fbb9
1abb34f82523039920fc629a68d3f82bc79acbd0
4b825dc642cb6eb9a060e54bf8d69288fbee4904
8aa22b5f0fed338dd13c16537c1c54b3496e3224
c11e1562835fe1e9c25bf293279bff0cf778b6e0
d1efc109115b00bac9d4e3d374a05a3df9754551
Now we see only two commits are being referenced by Git. However, all six objects are still in the repository. They will remain in the repository until they are automatically or explicitly garbage collected. You could even, for example, revive an unreferenced commit with git cherry-pick
or look at it with git show
. For now though, let's explicitly garbage collect the unreferenced objects and see what Git does behind the scenes.
GIT_TRACE=1 git gc --aggressive --prune=now
This will output a bit of information.
11:03:03.123194 git.c:348 trace: built-in: git 'gc' '--aggressive' '--prune=now'
11:03:03.123625 run-command.c:347 trace: run_command: 'pack-refs' '--all' '--prune'
11:03:03.124038 exec_cmd.c:129 trace: exec: 'git' 'pack-refs' '--all' '--prune'
11:03:03.126895 git.c:348 trace: built-in: git 'pack-refs' '--all' '--prune'
11:03:03.128298 run-command.c:347 trace: run_command: 'reflog' 'expire' '--all'
11:03:03.128635 exec_cmd.c:129 trace: exec: 'git' 'reflog' 'expire' '--all'
11:03:03.131322 git.c:348 trace: built-in: git 'reflog' 'expire' '--all'
11:03:03.133179 run-command.c:347 trace: run_command: 'repack' '-d' '-l' '-f' '--depth=250' '--window=250' '-a'
11:03:03.133522 exec_cmd.c:129 trace: exec: 'git' 'repack' '-d' '-l' '-f' '--depth=250' '--window=250' '-a'
11:03:03.136915 git.c:348 trace: built-in: git 'repack' '-d' '-l' '-f' '--depth=250' '--window=250' '-a'
11:03:03.137179 run-command.c:347 trace: run_command: 'pack-objects' '--keep-true-parents' '--honor-pack-keep' '--non-empty' '--all' '--reflog' '--indexed-objects' '--window=250' '--depth=250' '--no-reuse-delta' '--local' '--delta-base-offset' '.git/objects/pack/.tmp-8973-pack'
11:03:03.137686 exec_cmd.c:129 trace: exec: 'git' 'pack-objects' '--keep-true-parents' '--honor-pack-keep' '--non-empty' '--all' '--reflog' '--indexed-objects' '--window=250' '--depth=250' '--no-reuse-delta' '--local' '--delta-base-offset' '.git/objects/pack/.tmp-8973-pack'
11:03:03.140367 git.c:348 trace: built-in: git 'pack-objects' '--keep-true-parents' '--honor-pack-keep' '--non-empty' '--all' '--reflog' '--indexed-objects' '--window=250' '--depth=250' '--no-reuse-delta' '--local' '--delta-base-offset' '.git/objects/pack/.tmp-8973-pack'
Counting objects: 3, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), done.
Total 3 (delta 1), reused 0 (delta 0)
11:03:03.153843 run-command.c:347 trace: run_command: 'prune' '--expire' 'now'
11:03:03.154255 exec_cmd.c:129 trace: exec: 'git' 'prune' '--expire' 'now'
11:03:03.156744 git.c:348 trace: built-in: git 'prune' '--expire' 'now'
11:03:03.159210 run-command.c:347 trace: run_command: 'rerere' 'gc'
11:03:03.159527 exec_cmd.c:129 trace: exec: 'git' 'rerere' 'gc'
11:03:03.161807 git.c:348 trace: built-in: git 'rerere' 'gc'
And finally, let's look at the objects.
git_log_objects
Which outputs:
Log ...
10bfa58 (HEAD -> master, tag: T1) C2
c11e156 C1
Reflog ...
Count ...
count: 0
size: 0
in-pack: 3
packs: 1
size-pack: 1
prune-packable: 0
garbage: 0
size-garbage: 0
Hashes ...
10bfa58a7bcbadfc6c9af616da89e4139c15fbb9
4b825dc642cb6eb9a060e54bf8d69288fbee4904
c11e1562835fe1e9c25bf293279bff0cf778b6e0
Now we see we only have three objects: the two commits and one empty tree.
Run git show 6c35831
to see that C4, for instance, is still there. Run git reflog master
to see (lots of) what master
used to reference. One of the entries (master^{1}
mostly likely, but perhaps one older if you have made other changes as well) should correspond to 6c35831
, and git show master^{1}
(or whichever entry it is) should show the same output of the first git show
command I mentioned.
Orphaned commits just stay there until they are garbage collected by explicitly running git gc
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With