Could somebody explain how git knows internally that files X, Y and Z have changed? What is the process behind the scenes that recognizes when a file has not yet been added or has modifications? I am asking because, with Subversion it's simple to figure out that it keeps track of these things by having a .svn
directory under each folder, but for git I can't seem to find a description of the inner workings of this. I doubt it scans through all the sub-directories for changes, as it's quite fast.
So, out if curiosity, what are it's inner workings?
The tree object is how Git keeps track of file names and directories. There is a tree object for each directory. The tree object points to the SHA-1 blobs, the files, in that directory, and other trees, sub-directories at the time of the commit.
Diffing is a function that takes two input data sets and outputs the changes between them. git diff is a multi-use Git command that when executed runs a diff function on Git data sources. These data sources can be commits, branches, files and more.
No, there isn't. But you can store in git a text files with the 2 or 3 commands you use to reconfigure each repository. You can make it a .
Find what file changed in a commit To find out which files changed in a given commit, use the git log --raw command. It's the fastest and simplest way to get insight into which files a commit affects.
Git keeps track of four objects: a blob, a tree, a commit, and a tag. To answer your question on how it keeps track of changes here's a quote from that link: The tree object is how Git keeps track of file names and directories. There is a tree object for each directory.
You might be asking about how Git stores files that change from one branch to another, and the answer is that it doesn’t. Git stores whole files, and computes diffs and so on after the fact, only when they’re needed. If you move a file, it knows that you’ve done so because the files have the same hash.
When you commit, git stores snapshots of the entire file, it does not store diffs from the previous commit. As a repository grows, the object count grows exponentially and clearly it becomes inefficient to store the data as loose object files. Hence, git packs them and stores them as a .pack file. Git Packs
When you commit, git stores snapshots of the entire file, it does not store diffs from the previous commit. As a repository grows, the object count grows exponentially and clearly it becomes inefficient to store the data as loose object files.
The mechanisms by which one determines the status of a file is fairly straightforward. To know what files have been staged, one simply diffs the HEAD
tree with the index. Any items that appear only in the index have been staged for addition, any items that appear only in HEAD
have been removed and any items that are different have had changes staged.
Similarly, one would detect unstaged changes by diff'ing the index with the working directory.
Your question in particular asks how this can be so fast (after all, computing the SHA1 hash of a file is not exactly speedy.) This is where the index - also known as the cache - comes in to play again. The index also has fields for the file size and file modification time. Thus one can simply stat(2)
a file on disk and compare against the index's file size and file modification time to know whether to hash the file or not.
You can find your answer in the free book Pro-Git on chapter Git Internals
This chapter explains how git works behind the hood.
As Leo stated, git checks the SHA1 of the files to see if it has changed you can check it like this (Taken from Git Internals):
$ echo 'version 1' > test.txt $ git hash-object -w test.txt 83baae61804e65cc73a7201a7252750c76066a30
Then, write some new content to the file, and save it again:
$ echo 'version 2' > test.txt $ git hash-object -w test.txt 1f7a7a472abf3dd9643fd615f6da379c4acb3e3a
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With