Wikipedia explains the automatic rename detection:
Briefly, given a file in revision N, a file of the same name in revision N−1 is its default ancestor. However, when there is no like-named file in revision N−1, Git searches for a file that existed only in revision N−1 and is very similar to the new file.
Rename detection apparently boils down to similar file detection. Is that algorithm documented anywhere? It would be nice to know what kinds of transformations are detected automatically.
Git keeps track of changes to files in the working directory of a repository by their name. When you move or rename a file, Git doesn't see that a file was moved; it sees that there's a file with a new filename, and the file with the old filename was deleted (even if the contents remain the same).
We use the git mv command in git to rename and move files. We only use the command for convenience. It does not rename or move the actual file, but rather deletes the existing file and creates a new file with a new name or in another folder.
Git tracks file contents, not filenames. So renaming a file without changing its content is easy for git to detect. (Git does not track, but performs detection; using git mv
or git rm
and git add
is effectively the same.)
When a file is added to the repository, the filename is in the tree object. The actual file contents are added as a binary large object (blob) in the repository. Git will not add another blob for additional files that contain the same content. In fact, Git cannot as the content is stored in the filesystem with first two characters of the hash being the directory name and the rest being the name of file within it. So to detect renames is a matter of comparing hashes.
To detect small changes to a renamed file, Git uses certain algorithms and a threshold limit to see if this is a rename. For example, have a look at the -M
flag for git diff
. There are also configuration values such as merge.renameLimit
(the number of files to consider when performing rename detection during a merge).
To understand how git treats similar files (i.e., what file transformations are considered as renames), explore the configuration options and flags available, as mentioned above. You need not be considered with the how. To understand how git actually accomplishes these tasks, look at the algorithms for finding differences in text, and read the git source code.
Algorithms are applied only for diff, merge, and log purposes -- they do not affect how git stores them. Any small change in file content means a new object is added for it. There is no delta or diff happening at that level. Of course, later, the objects might be packed where deltas are stored in packfiles, but that is not related to the rename detection.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With