What is the mathematical structure that represents a Git repo

Tags:

I am learning about Git, and it would be great if I had a description of the mathematical structure that represents a Git repo. For instance: it's a directed acyclic graph; its nodes represent commits; its nodes have labels (at most one label per node, no label used twice) that represent branches, etc. (I know this description is not correct, I'm just trying to explain what I'm looking for.)

306

asked Sep 03 '13 08:09

max

1 Answers

In addition to the links in Nevik Rehnel's comment (copied here per request: eagain.net/articles/git-for-computer-scientists and gitolite.com/gcs), and sehe's point that the commit graph forms a Merkle Tree, I'll add a few notes.

There are four object types in the object-store: commit, tree, annotated-tag, and blob (file).
A commit object contains exactly one tree-ref (which of course can point to more trees), a possibly-empty list of parent SHA-1 hashes (which must all be more commits), an author (name, email, and timestamp), a committer (same form as author), and the commit text.
A tree object contains a list of (mode, sub-object, filename) repeated 0-or-more-times. If the sub-object is another tree the filename represents a directory. If it's a blob, it represents a file. The mode looks like a POSIX file mode and if it's 120000 (the file mode for a symlink), the file's "contents" are really the symlink target. Some mode value is (ab)used for submodules, but I forget which. R and W mode bits are not stored, only X bits (and even then they're ignored if the repo configuration says to ignore them).
An annotated-tag object contains an object reference, a tagger (name, email, and timestamp), and the tag text. The referenced object is normally a commit but a tag object can point to any object (even another tag object).
The labels (branches and tags and reflog-references and so on) live outside the object-store. For annotated tags, there's a label outside, pointing to the annotated tag object inside the object-store. For a lightweight tag, the outside label points right to a commit.
There is no restriction that there be only one root commit. Any commit with no parents is a root.
Git almost never makes an empty tree (which would represent an empty directory), except for two cases: there's an empty tree at all times in every repo, and if you make an initial empty commit (with git commit --allow-empty) it uses that empty tree. (Since the empty tree has no sub-objects, its SHA-1 hash value is a constant.)
The "DAG" description is generally meant for the trees formed by closing over commit parent hashes. However, a tree object should in general not contain itself in any of its subtrees, and if you managed to make a cyclic tree structure you would not be able to check it out (because it recurses infinitely). Assuming you cannot make two different trees with the same checksum (if you could you'd break git), you won't find a tree T1 that contains a tree T2 that contains a different tree whose checksum is T1. So the trees are implicitly a DAG too, and being attached to commit-DAGs, they form a bigger DAG. :-)
Unreferenced objects in the object-store will get garbage-collected by git gc. The empty tree appears to be immune to collection. Anything in the refs/ and logs/ directories and the file packed-refs (in .git, or for bare repos or when $GIT_DIR is set, wherever else) acts as a reference, as do the special names (HEAD, ORIG_HEAD, etc.); I'm not sure if other random files, if created in .git and containing valid SHA-1s, would act as references, or not.
The index has some format I've never dug into. It contains references to objects in the object store. When you git add a file, git drops the file into the object-store and places the (non-text) SHA-1 hash into the index file. These are valid references that prevent garbage collection.

answered Nov 08 '22 08:11

torek

Related questions
                            
                                Git GPG signing fails without a clear message
                            
                                Which Jenkins Command to Get the List of Changed Files
                            
                                "This branch has conflicts that must be resolved" but it's already merged
                            
                                Recovering broken git repository
                            
                                git merging changes to local branch
                            
                                How can I specify custom global gitconfig path?
                            
                                Why is git-svn is randomly changing my root directory to the parent? [duplicate]
                            
                                git: hash autocomplete
                            
                                How can I make my local Git repository accessible for multiple users?
                            
                                gitolite push error -> remote: ENV GL_RC not set
                            
                                Equivalent of git ls-files in mercurial?
                            
                                Why do I get CRLF line breaks in source files, although I'm on a Mac+Linux-only environment with Git and IDEA?
                            
                                How to delete a blob from a Git repo
                            
                                How do I use git-svn to merge changes from trunk to a branch in svn?
                            
                                Add symlinks to git repository
                            
                                Gitk: Setting "Ignore space change" option to be true by default
                            
                                Installing Git HTML Help on OSX
                            
                                Is it possible to git-diff a file against standard input?
                            
                                Git - why are double dashes needed when running a command on a deleted file?
                            
                                Git branch name in prompt

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the mathematical structure that represents a Git repo

Tags:

git

graph

max

People also ask

1 Answers

torek

Recent Activity

Donate For Us