Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does git detect similar files, for its rename detection?

Tags:

git

Wikipedia explains the automatic rename detection:

Briefly, given a file in revision N, a file of the same name in revision N−1 is its default ancestor. However, when there is no like-named file in revision N−1, Git searches for a file that existed only in revision N−1 and is very similar to the new file.

Rename detection apparently boils down to similar file detection. Is that algorithm documented anywhere? It would be nice to know what kinds of transformations are detected automatically.

like image 997
mahemoff Avatar asked Oct 29 '11 11:10

mahemoff


People also ask

How does Git detect renamed files?

Git keeps track of changes to files in the working directory of a repository by their name. When you move or rename a file, Git doesn't see that a file was moved; it sees that there's a file with a new filename, and the file with the old filename was deleted (even if the contents remain the same).

How does Git rename work?

We use the git mv command in git to rename and move files. We only use the command for convenience. It does not rename or move the actual file, but rather deletes the existing file and creates a new file with a new name or in another folder.


1 Answers

Git tracks file contents, not filenames. So renaming a file without changing its content is easy for git to detect. (Git does not track, but performs detection; using git mv or git rm and git add is effectively the same.)

When a file is added to the repository, the filename is in the tree object. The actual file contents are added as a binary large object (blob) in the repository. Git will not add another blob for additional files that contain the same content. In fact, Git cannot as the content is stored in the filesystem with first two characters of the hash being the directory name and the rest being the name of file within it. So to detect renames is a matter of comparing hashes.

To detect small changes to a renamed file, Git uses certain algorithms and a threshold limit to see if this is a rename. For example, have a look at the -M flag for git diff. There are also configuration values such as merge.renameLimit (the number of files to consider when performing rename detection during a merge).

To understand how git treats similar files (i.e., what file transformations are considered as renames), explore the configuration options and flags available, as mentioned above. You need not be considered with the how. To understand how git actually accomplishes these tasks, look at the algorithms for finding differences in text, and read the git source code.

Algorithms are applied only for diff, merge, and log purposes -- they do not affect how git stores them. Any small change in file content means a new object is added for it. There is no delta or diff happening at that level. Of course, later, the objects might be packed where deltas are stored in packfiles, but that is not related to the rename detection.

like image 93
manojlds Avatar answered Sep 20 '22 19:09

manojlds