How does git store files?

Tags:

git

People also ask

How is a data stored in git?

Git doesn't think of or store its data this way. Instead, Git thinks of its data more like a set of snapshots of a mini filesystem. Every time you commit, or save the state of your project in Git, it basically takes a picture of what all your files look like at that moment and stores a reference to that snapshot.

Where are git files stored?

Git stores the complete history of your files for a project in a special directory (a.k.a. a folder) called a repository, or repo. This repo is usually in a hidden folder called . git sitting next to your files.

How does Git keep track of files?

The tree object is how Git keeps track of file names and directories. There is a tree object for each directory. The tree object points to the SHA-1 blobs, the files, in that directory, and other trees, sub-directories at the time of the commit.

Does GIT store files or diffs?

No, commit objects in git don't contain diffs - instead, each commit object contains a hash of the tree, which recursively and completely defines the content of the source tree at that commit.

Git does include for each commit a full copy of all the files, except that, for the content already present in the Git repo, the snapshot will simply point to said content rather than duplicate it.
That also means that several files with the same content are stored only once.

So a snapshot is basically a commit, referring to the content of a directory structure.

Some good references are:

git.github.io/git-reference

You tell Git you want to save a snapshot of your project with the git commit command and it basically records a manifest of what all of the files in your project look like at that point

2020: "A commit in Git: Is it a snapshot/state/image or is it a change/diff/patch/delta?"
git immersion

Lab 12 illustrates how to get previous snapshots

"You could have invented git (and maybe you already have!)"
What is a git “Snapshot”?
Learn GitHub

The progit book has the more comprehensive description of a snapshot:

The major difference between Git and any other VCS (Subversion and friends included) is the way Git thinks about its data.
Conceptually, most other systems store information as a list of file-based changes. These systems (CVS, Subversion, Perforce, Bazaar, and so on) think of the information they keep as a set of files and the changes made to each file over time

Git doesn’t think of or store its data this way. Instead, Git thinks of its data more like a set of snapshots of a mini filesystem.
Every time you commit, or save the state of your project in Git, it basically takes a picture of what all your files look like at that moment and stores a reference to that snapshot.
To be efficient, if files have not changed, Git doesn’t store the file again—just a link to the previous identical file it has already stored.
Git thinks about its data more like as below:

This is an important distinction between Git and nearly all other VCSs. It makes Git reconsider almost every aspect of version control that most other systems copied from the previous generation. This makes Git more like a mini filesystem with some incredibly powerful tools built on top of it, rather than simply a VCS.

See also:

"If git functions off of snapshots of files, why doesn't .git/ become huge over time?"
"What information is stored as each git commit's tree object content"

Jan Hudec adds this important comment:

While that's true and important on the conceptual level, it is NOT true at the storage level.
Git does use deltas for storage.
Not only that, but it's more efficient in it than any other system. Because it does not keep per-file history, when it wants to do delta compression, it takes each blob, selects some blobs that are likely to be similar (using heuristics that includes the closest approximation of previous version and some others), tries to generate the deltas and picks the smallest one. This way it can (often, depends on the heuristics) take advantage of other similar files or older versions that are more similar than the previous. The "pack window" parameter allows trading performance for delta compression quality. The default (10) generally gives decent results, but when space is limited or to speed up network transfers, git gc --aggressive uses value 250, which makes it run very slow, but provide extra compression for history data.

Git logically stores each file under its SHA1. What this means is if you have two files with exactly the same content in a repository (or if you rename a file), only one copy is stored.

But this also means that when you modify a small part of a file and commit, another copy of the file is stored. The way git solves this is using pack files. Once in a while, all the “loose” files (actually, not just files, but objects containing commit and directory information too) from a repo are gathered and compressed into a pack file. The pack file is compressed using zlib. And similar files are also delta-compressed.

The same format is also used when pulling or pushing (at least with some protocols), so those files don't have to be recompressed again.

The result of this is that a git repository, containing the whole uncompressed working copy, uncompressed recent files and compressed older files is usually relatively small, two times smaller than the size of the working copy. And this means it's smaller than SVN repo with the same files, even though SVN doesn't store the history locally.

Related questions
                            
                                Git add all files modified, deleted, and untracked?
                            
                                Git vs Team Foundation Server [closed]
                            
                                Issue with adding common code as git submodule: "already exists in the index"
                            
                                How do I change the default location for Git Bash on Windows?
                            
                                If I fork someone else's private Github repo into my account, is it going to appear in my account as a public repo?
                            
                                HEAD and ORIG_HEAD in Git
                            
                                How to "pull" from a local branch into another one?
                            
                                How to get the changes on a branch in Git
                            
                                How do I get git to default to ssh and not https for new repositories
                            
                                rejected master -> master (non-fast-forward)
                            
                                "Unable to find remote helper for 'https'" during git clone
                            
                                Vim for Windows - What do I type to save and exit from a file?
                            
                                Git: "please tell me who you are" error
                            
                                How to reverse apply a stash?
                            
                                How do I git rebase the first commit?
                            
                                Git: Cherry-Pick to working copy without commit
                            
                                How to make Git pull use rebase by default for all my repositories?
                            
                                How to add chmod permissions to file in Git?
                            
                                Changing the Git remote 'push to' default
                            
                                Trying to fix line-endings with git filter-branch, but having no luck

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How does git store files?

Tags:

git

People also ask

Related questions

Recent Activity

Donate For Us