Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does git track source code moved between files?

Apparently, when you move a function from one source code file to another, the git revision log (for the new file) can show you where that code fragment was originally coming from (see for example the Viewing History section in this tutorial).

How does this work?

like image 464
Thilo Avatar asked Nov 13 '09 12:11

Thilo


People also ask

Does git track file moves?

Git keeps track of changes to files in the working directory of a repository by their name. When you move or rename a file, Git doesn't see that a file was moved; it sees that there's a file with a new filename, and the file with the old filename was deleted (even if the contents remain the same).

How does git know which files have changed?

Indexing. For every tracked file, Git records information such as its size, creation time and last modification time in a file known as the index. To determine whether a file has changed, Git compares its current stats with those cached in the index. If they match, then Git can skip reading the file again.

How does git keep track of changes made by each user to their local copy?

The tree object is how Git keeps track of file names and directories. There is a tree object for each directory. The tree object points to the SHA-1 blobs, the files, in that directory, and other trees, sub-directories at the time of the commit.

How do you check changes before commit?

If you just want to see the diff without committing, use git diff to see unstaged changes, git diff --cached to see changes staged for commit, or git diff HEAD to see both staged and unstaged changes in your working tree.


1 Answers

It doesn't track them. That's the beauty of it.

Git only records snapshots of the entire project tree: here's what all files looked like before the commit and here's how they look like after. How we got from here to there, Git doesn't care.

This allows intelligent tools to be written after a commit has already happened, to extract information from that commit. For example, rename detection in Git is done by comparing all deleted files against all new files and comparing pairwise similarity metrics. If the similarity metric is greater than x, they are considered renamed, if it is between y and x (y < x), it is considered to be a rename+edit, and if it is below y, they are considered independent. The cool thing is that you, as a "commit archaeologist", can specify after the fact, what x and y should be. This would not work if the commit simply recorded "this file is a rename of that file".

Detecting moved content works similar: you slice every file into pieces, compute similarity metrics between all the slices and can then deduce that this slice which was deleted over here and this very similar slice which was added over there are actually the same slice that was moved from here to there.

However, as tonfa mentioned in his answer, this is very expensive, so it is not normally done. But it could be done, and that's the point.

BTW: this is pretty much the exact opposite of the Operational Transformation model used by Google Wave, EtherPad, Gobby, SubEthaEdit, ACE and Co.

like image 103
Jörg W Mittag Avatar answered Oct 15 '22 10:10

Jörg W Mittag