Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does git track file changes internally?

Tags:

Could somebody explain how git knows internally that files X, Y and Z have changed? What is the process behind the scenes that recognizes when a file has not yet been added or has modifications? I am asking because, with Subversion it's simple to figure out that it keeps track of these things by having a .svn directory under each folder, but for git I can't seem to find a description of the inner workings of this. I doubt it scans through all the sub-directories for changes, as it's quite fast.

So, out if curiosity, what are it's inner workings?

like image 717
carlspring Avatar asked Apr 02 '13 13:04

carlspring


People also ask

How does Git keep track of changes made by each user to their local copy?

The tree object is how Git keeps track of file names and directories. There is a tree object for each directory. The tree object points to the SHA-1 blobs, the files, in that directory, and other trees, sub-directories at the time of the commit.

How git diff works internally?

Diffing is a function that takes two input data sets and outputs the changes between them. git diff is a multi-use Git command that when executed runs a diff function on Git data sources. These data sources can be commits, branches, files and more.

Is .GIT folder tracked?

No, there isn't. But you can store in git a text files with the 2 or 3 commands you use to reconfigure each repository. You can make it a .

How do you find a list of files that have been changed in a particular commit?

Find what file changed in a commit To find out which files changed in a given commit, use the git log --raw command. It's the fastest and simplest way to get insight into which files a commit affects.

How does Git keep track of changes in files?

Git keeps track of four objects: a blob, a tree, a commit, and a tag. To answer your question on how it keeps track of changes here's a quote from that link: The tree object is how Git keeps track of file names and directories. There is a tree object for each directory.

How does Git store changes from one branch to another?

You might be asking about how Git stores files that change from one branch to another, and the answer is that it doesn’t. Git stores whole files, and computes diffs and so on after the fact, only when they’re needed. If you move a file, it knows that you’ve done so because the files have the same hash.

Why does Git pack files?

When you commit, git stores snapshots of the entire file, it does not store diffs from the previous commit. As a repository grows, the object count grows exponentially and clearly it becomes inefficient to store the data as loose object files. Hence, git packs them and stores them as a .pack file. Git Packs

What happens when you commit to a git repository?

When you commit, git stores snapshots of the entire file, it does not store diffs from the previous commit. As a repository grows, the object count grows exponentially and clearly it becomes inefficient to store the data as loose object files.


2 Answers

The mechanisms by which one determines the status of a file is fairly straightforward. To know what files have been staged, one simply diffs the HEAD tree with the index. Any items that appear only in the index have been staged for addition, any items that appear only in HEAD have been removed and any items that are different have had changes staged.

Similarly, one would detect unstaged changes by diff'ing the index with the working directory.

Your question in particular asks how this can be so fast (after all, computing the SHA1 hash of a file is not exactly speedy.) This is where the index - also known as the cache - comes in to play again. The index also has fields for the file size and file modification time. Thus one can simply stat(2) a file on disk and compare against the index's file size and file modification time to know whether to hash the file or not.

like image 200
Edward Thomson Avatar answered Oct 04 '22 00:10

Edward Thomson


You can find your answer in the free book Pro-Git on chapter Git Internals

This chapter explains how git works behind the hood.

As Leo stated, git checks the SHA1 of the files to see if it has changed you can check it like this (Taken from Git Internals):

$ echo 'version 1' > test.txt $ git hash-object -w test.txt 83baae61804e65cc73a7201a7252750c76066a30 

Then, write some new content to the file, and save it again:

$ echo 'version 2' > test.txt $ git hash-object -w test.txt 1f7a7a472abf3dd9643fd615f6da379c4acb3e3a 
like image 24
stdcall Avatar answered Oct 03 '22 23:10

stdcall