Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

git: dangling blobs

I recently ran git fsck --lost-found on my repository.

I expected to see a couple dangling commits, where I had reset HEAD.

However, I was surprised to see likely over several thousand dangling blob messages.

I don't believe anything is wrong with my repository, but I'm curious as to what causes these dangling blobs? There's only two people working on the repository, and we haven't done anything out of the ordinary.

I wouldn't think they were created by an older version of a file being replaced by a new one, since git would need to hold onto both blobs so it can display history.

Come to think of it, at one point we did add a VERY large directory (thousands of files) to the project by mistake and then remove it. Might this be the source of all the dangling blobs?

Just looking for insight into this mystery.

like image 300
wadesworld Avatar asked Mar 31 '12 12:03

wadesworld


People also ask

What are dangling blobs in git?

Dangling blob = A change that made it to the staging area/index, but never got committed. One thing that is amazing with Git is that once it gets added to the staging area, you can always get it back because these blobs behave like commits in that they have a hash too!!

Which command recovers commits that are not referenced by a branch or tag?

The git prune command is an internal housekeeping utility that cleans up unreachable or "orphaned" Git objects. Unreachable objects are those that are inaccessible by any refs. Any commit that cannot be accessed through a branch or tag is considered unreachable. git prune is generally not executed directly.


3 Answers

Last time I looked at this I stumbled across this thread, specifically this part:

You can also end up with dangling objects in packs. When that pack is repacked, those objects will be loosened, and then eventually expired under the rule mentioned above. However, I believe gc will not always repack old packs; it will make new packs until you have a lot of packs, and then combine them all (at least that is what "gc --auto" will do; I don't recall whether just "git gc" follows the same rule).

So it's normal behavior, and does get collected eventually, I believe.

edit: Per Daniel, you can immediately collect it by running

git gc --prune="0 days"
like image 55
Waynn Lue Avatar answered Oct 07 '22 06:10

Waynn Lue


I was really impatient and used:

git gc --prune="0 days"
like image 40
Daniel Avatar answered Oct 07 '22 05:10

Daniel


Whenever you add a file to the index, the content of that file are added to Git's object database as a blob. When you then reset/rm --cached that file, the blobs will still exist (they will be garbage collected the next time you run gc)

However, when those files are part of a commit and you decide later to reset history, then the old commits are still reachable from Git's reflog and will only be garbage collected after a period of time (usually a month, iirc). Those objects should not show up as dangling though, since they are still referenced from the reflog.

like image 16
knittl Avatar answered Oct 07 '22 05:10

knittl