Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

git fsck: how --dangling vs. --unreachable vs. --lost-found differ?

Tags:

git

I've recently found about git fsck, but the linked answers and git help fsck give a list of various alternative options, some of which seem to mean the same to an untrained eye. To be able to use the tool well, I'd love to learn what's the difference between below commands?

  • git fsck --dangling
  • git fsck --unreachable
  • git fsck --lost-found

Also, can/should they be used together in some combinations, or better not?

(As a side note, I'm particularly interested in using this in git log -G$REGEX $(git fsck --something), to cast the net as wide as possible, in a faint hope of finding something I remember writing at some point, but that I can't find with a git log -G$REGEX -a.)

like image 958
akavel Avatar asked Apr 14 '16 11:04

akavel


People also ask

How does git fsck work?

The git fsck command checks the connectivity and validity of objects in the git repository. Using this command, users can confirm the integrity of the files in their repository and identify any corrupted objects. This command will also inform the user if there are any dangling objects in the repository.

What is a dangling blob in git?

Save this answer. Show activity on this post. Dangling blob = A change that made it to the staging area/index, but never got committed.

What is a dangling tree git?

"Dangling" is git's slightly quirky spelling of "unreferenced". 4b825dc642cb6eb9a060e54bf8d69288fbee4904 in new repository. That's the SHA of the empty-tree, git mktree </dev/null will make it.


1 Answers

Part of the answer is in the git glossary, where we find this:

dangling object

An unreachable object which is not reachable even from other unreachable objects; a dangling object has no references to it from any reference or object in the repository.

(all links theirs). Reachability (follow their link if you like) is a basic concept in git's commit graph, where we start with some external reference like a branch or tag name to get starting points within the graph, then follow the outbound edges from each node to find all other nodes.

(There's a glossary entry for ref but not for reference, but reference just has its regular dictionary meaning here.)

I think this is best explained illustratively, though. Suppose we have a commit DAG that looks like this:

     C--D--E      <-- branch-a
    /
A--B--F---G--H    <-- branch-b
    \    /
     I--J--K--L   <-- branch-c

Nodes always point left-ish, while possibly also pointing up or down, so node E, for instance, points back at D, which points at C which points at B which points at A. (A points nowhere: it is a root node.) Node G is a merge and points back at both F and J. Every node in this graph is reachable: we start from all the external references (branches) and walk left-ish and discover that nodes A through E are on branch-a; nodes A, B, and F through G are on branch-b; and so on. (Note that nodes A and B are on every branch. The fact that a node can be on many branches is one of the things that is a bit unusual about git. In mercurial, for instance, each node is only ever on one branch. In this particular way, git's branches are fluid while mercurial's are fixed.)

Now let's see what happens if we erase one of the branch labels. Let's peel off the branch-a label first.

Commit E no longer has anything pointing to it. It is unreachable, and also—in git's term here—dangling. Commit D has only commit E pointing to it. Since E is unreachable, D is also unreachable, but D is not dangling, because E points to D. C is in the same state as D. Node B, on the other hand, is reachable from branch-b by following H to G to F to B, and by following H to G to J to I to B, and from branch-c by following L to K to J to I to B.

Let's put the branch-a label back (so that C through E are reachable again) and peel off branch-c instead. This time L and K become unreachable. Node J remains reachable, though, by starting with branch-b and working from H to G to J. Of the K and L commits, only L is dangling, because L points to K.

When using git fsck, as I noted in that other answer, --lost-found "resurrects" (some) dangling objects by writing their IDs or contents into .git/lost-found/.

(Remember that commits point back to previous commits, while blobs are just text and never point to anything. You get dangling commits when you delete a branch, or when rebased-and-thus-abandoned commit chains lose their reflog reference, for instance, so they are pretty normal. You get dangling blobs when you git add a file's contents, then either git reset it or git add new contents without committing first, so dangling blobs are pretty normal. git fsck does not save dangling tree or tag objects. Normally there should be no dangling trees: tree objects can only point to more trees and to blobs, and any dangling tree should normally have been pointed-to by a commit; and you have to use git write-tree manually, and then never reference the tree, to get a dangling tree. I'm not sure why tag objects are not resurrected, since accidentally deleting the external reference for an annotated tag will result in dangling tag objects, and it might be nice to be able to get those back.)

Summary: git fsck detection and restoration of dangling or unreferenced objects

Unreachable objects are those not reachable from external references (principally branch and tag names, though there are others like refs/stash, used by git stash). Dangling objects are a subset of unreachable objects, specifically those with no inbound arcs (in graph theoretic terms).

Adding the --lost-found flag will save the IDs of dangling commits—which makes these commits, and hence any additional unreferenced commits, all referenced again—and decompress and make available all dangling blob objects.

like image 184
torek Avatar answered Oct 24 '22 15:10

torek