Wikipedia explains the automatic rename detection: <blockquote> Briefly, given a file in revision N, a file of the same name in revision N−1 is its default ancestor. However, when there is no like-named file in revision N−1, Git searches for a file that existed only in revision N−1 and is very similar to the new file. </blockquote> Rename detection apparently boils down to similar file detection. Is that algorithm documented anywhere? It would be nice to know what kinds of transformations are detected automatically.

Git tracks file contents, not filenames. So renaming a file without changing its content is easy for git to detect. (Git does not track, but performs detection; using <code>git mv</code> or <code>git rm</code> and <code>git add</code> is effectively the same.) When a file is added to the repository, the filename is in the tree object. The actual file contents are added as a binary large object (blob) in the repository. Git will not add another blob for additional files that contain the same content. In fact, Git cannot as the content is stored in the filesystem with first two characters of the hash being the directory name and the rest being the name of file within it. So to detect renames is a matter of comparing hashes. To detect small changes to a renamed file, Git uses certain algorithms and a threshold limit to see if this is a rename. For example, have a look at the <code>-M</code> flag for <code>git diff</code>. There are also configuration values such as <code>merge.renameLimit</code> (the number of files to consider when performing rename detection during a merge). To understand how git treats similar files (i.e., what file transformations are considered as renames), explore the configuration options and flags available, as mentioned above. You need not be considered with the how. To understand how git actually accomplishes these tasks, look at the algorithms for finding differences in text, and read the git source code. Algorithms are applied only for diff, merge, and log purposes -- they do not affect how git stores them. Any small change in file content means a new object is added for it. There is no delta or diff happening at that level. Of course, later, the objects might be packed where deltas are stored in packfiles, but that is not related to the rename detection.

How does git detect similar files, for its rename detection?

Tags:

git

Wikipedia explains the automatic rename detection:

Briefly, given a file in revision N, a file of the same name in revision N−1 is its default ancestor. However, when there is no like-named file in revision N−1, Git searches for a file that existed only in revision N−1 and is very similar to the new file.

Rename detection apparently boils down to similar file detection. Is that algorithm documented anywhere? It would be nice to know what kinds of transformations are detected automatically.

997

asked Oct 29 '11 11:10

mahemoff

1 Answers

Git tracks file contents, not filenames. So renaming a file without changing its content is easy for git to detect. (Git does not track, but performs detection; using git mv or git rm and git add is effectively the same.)

When a file is added to the repository, the filename is in the tree object. The actual file contents are added as a binary large object (blob) in the repository. Git will not add another blob for additional files that contain the same content. In fact, Git cannot as the content is stored in the filesystem with first two characters of the hash being the directory name and the rest being the name of file within it. So to detect renames is a matter of comparing hashes.

To detect small changes to a renamed file, Git uses certain algorithms and a threshold limit to see if this is a rename. For example, have a look at the -M flag for git diff. There are also configuration values such as merge.renameLimit (the number of files to consider when performing rename detection during a merge).

To understand how git treats similar files (i.e., what file transformations are considered as renames), explore the configuration options and flags available, as mentioned above. You need not be considered with the how. To understand how git actually accomplishes these tasks, look at the algorithms for finding differences in text, and read the git source code.

Algorithms are applied only for diff, merge, and log purposes -- they do not affect how git stores them. Any small change in file content means a new object is added for it. There is no delta or diff happening at that level. Of course, later, the objects might be packed where deltas are stored in packfiles, but that is not related to the rename detection.

answered Sep 20 '22 19:09

manojlds

Related questions
                            
                                git pull remote branch cannot find remote ref
                            
                                Visual Studio 2013 git, only Master branch listed
                            
                                I need to pop up and trash away a "middle" commit in my master branch. How can I do it?
                            
                                Using Git, show all commits that exist *only* on one specific branch, and not *any* others
                            
                                Switching branch on Xcode 9
                            
                                How to compare two different commits on the same branch in github?
                            
                                Git error: Unable to append to .git/logs/refs/remotes/origin/master: Permission denied
                            
                                How do I search for branch names in Git?
                            
                                How do you organise multiple git repositories, so that all of them are backed up together?
                            
                                Git format-patch to be svn compatible?
                            
                                How does git log --since count?
                            
                                git diff - show me line ending changes?
                            
                                Warning on "diff.renamelimit variable" when doing git push
                            
                                Remove old git commits
                            
                                warning: remote HEAD refers to nonexistent ref, unable to checkout
                            
                                Git: How to find out on which branch a tag is?
                            
                                How to force push a reset to remote repository?
                            
                                How to "git show" the diffs for a merge commit?
                            
                                Two git repositories in one directory?
                            
                                What's the common practice of gitignore for aspnet core project

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With