Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Git move directory from one branch to another within the same repository while preserving history?

Tags:

git

How to git move directory from one branch to another while preserving history?

I want to move in the same repo.

like image 239
shruti hegde Avatar asked May 28 '19 07:05

shruti hegde


1 Answers

TL;DR

You can't (quite) get what you want—but if you try sometimes, you might just find, you get what you need. 😀

All you have to do is rename the directory and commit. You may first need to extract the directory from the other commit, if it's not already in the commit at the tip of the current branch. That is, you might need an initial:

git checkout otherbranch -- path/to/directory
git commit                                      # optional, but see below

and then in any case, run:

git mv path/to/directory new/path/to/dir

and then git commit the result. That doesn't do what you want, but it might do what you need—especially if you make that first commit that doesn't have the renaming, so that you have two adjacent commits, one with the old names, and one with the new ones.

You may, instead, want to merge the branches, commit the merge, and only then rename and commit again. Whether you want this, and why, requires the long explanation.

Long

It's important to understand two things here:

  • Git stores only files, not directories.1 Each commit stores a full snapshot of all of the files in your project.
  • Files don't have history, in Git. In Git, the commits are the history.

People often object to the idea that Git stores snapshots, because git log -p shows patches, i.e., changes. You view commit 0a36ca1, say, and you see some change to README.md. Then git log goes on to commit 0a36ca1's parent commit 922bf37, say, and you see another change to README.md and/or some other files, and so on. Doesn't that mean 0a36ca1 just stores the changes to README.md? And the answer is: no, 0a36ca1 stores a full copy of README.md and all the other files. Git showed changes by inspecting both 922bf37—the parent of 0a36ca1, i.e., the commit that comes just before 0a36ca1—and 0a36ca1. Both commits have copies of every file. Git compared the two commits' files. All of the files in those two commits matched except for README.md. Git then compared the two README.md versions to see what changed, and showed you what changed in that file.

The git show command is similar, except that you typically give it the hash ID of one commit, and git show prints the commit's metadata (who made it, when, and why—the log message) and then compares the snapshot in the parent to the snapshot in that commit. Whatever's different, is what files you see.

When you ask for history with git log by running git log or git log master, Git:

  1. Starts at the current commit (git log), or the last commit in master, and shows you that commit a la git show.2
  2. Then, moves back to the parent of the commit it just showed. This gets complicated at merge commits, but for now, just think of a nice simple linear chain.

This repeats until Git runs out of parents, or you get tired of paging through git log output. Given a nice simple linear chain of commits, like:

A <-B <-C <-D <-E <-F <-G   <-- master

(the single uppercase letters stand in here for the big ugly hash IDs that Git really uses), Git starts by showing you G (as found by the name master), then moving to F and showing you F, then moving to E and showing E, and so on. Commit A is the very first commit in the repository—it has no parent; there's no backwards arrow coming out of A to let Git move left—so git show shows it as having every file created from scratch. That means git log -p shows it the same way. And of course, with no parent, there's no arrow to follow backwards.


1Technically, directories turn into internal tree objects, but Git won't store an empty directory for the simple reason that you can't get a directory into Git's index, and Git doesn't built commits from the work-tree, but rather from the index. It's easier to think of Git as just storing files, since that's the end effect.

2This assumes you're using git log -p, of course. There are several important difference between git log and git show: first, git log does this backwards walk; second, git log defaults to not showing a patch; and third, git show shows merge commits in a different default manner: git log -p defaults to saying to itself: ugh, a merge commit, that's too hard: I'll just print the log message and move on, without showing a diff at all. The git show default here is to show a combined diff, which is a reduced form of diff against multiple parents.


git log can show a subset of history

You can, instead of just running git log or git log master, run:

git log master -- path/to/file.ext

and see what appears to be the history of path/to/file.ext. What git log is doing here is walking commit history as usual, but then not showing some of the commits. That is, given our simple linear chain above, git log starts with commit G. It compares (the snapshots of) F and G to see what files changed. If those files do include path/to/file.ext, git log shows commit G. Then it moves back to commit F, even if it showed nothing at all.

In other words, instead of just showing you all the commits it walks, git log can show you selected commits from the walk. The result is that it seems like Git has file history—but it doesn't: it's just synthesizing a subset history from the real history, working as it goes.

This is important because when Git is doing this synthetic file-history creation, git log is modifying the commit walk. The git log documentation calls this History Simplification, and it's complicated. There are a half dozen or so git log options to control how history simplification will be performed. This means that the "file history" that you see with git log depends on what options you pass to git log, as well as what the actual commit history is.

(Read and study the History Simplification section, at least someday, because there is a lot to it. I've been using Git for a long time, and like to think I know a lot about it, but even then I have to refer back to the documentation for this. In particular, the notion of "TREESAME"—which applies after subtracting away unwanted tree components from each commit—and which commits are followed at merges is especially tricky.)

With --follow, git log will try to detect renames

As Git is doing this commit-by-commit, backwards traversal of a chain of commits, the diff from parent to child may show that some file is renamed. A file named README may have been renamed to one named README.md, for instance. A simple:

git log master -- README.md

will show you how README.md evolved over time (backwards), but stop when README.md was named README, because it's looking for README.md and commits from here on back don't have README.md.

When you add --follow to git log, it will follow that one file—it only works with one file!—across the rename, simply by changing which file it's looking for. Having detected that at, say, the commit-D-to-E boundary, the file that is now README.md was called README in commit D, git log stops looking for README.md and starts, with D, looking for changes to a file named README. It's really that simple.

--follow is too simple for your use case

The problem here is that --follow is that simple, which is too simple. So it won't do what you want, for two reasons:

  • First, you're talking about copying files across some fairly large gap:

    ...--F--G--H   <-- master
     \
      N--O--P   <-- branch
    

    If your directory-full-of-files is in commit H on master, and you're just now copying it to a new commit you'll make on branch that comes after commit P, well, there's no backwards link from P to H. That's why I proposed that you commit the files without renaming them, then rename them and commit again. The result will be:

    ...--F--G--H   <-- master
     \
      N--O--P--Q--R   <-- branch
    

    where commit R has the files renamed, and Q has them not-renamed, just copied from H. In the commit log message for Q, you can state that the entire directory has been copied from branch master at a time when it pointed to commit H (use H's real hash ID here—run git rev-parse master to see what hash ID master specifies right now). Then you rename the directory and commit again to make them show up as renames, whenever Git walks from commit R back to commit Q.

  • The git log --follow option only works on one file. That is, given a commit that is or descends from R, and therefore has the new directory name, you must run:

    git log --follow [<commit-hash>] [--] new/path/to/dir/file.ext
    

    which will eventually work its way to commit R, show new/path/to/dir/file.ext (because it is renamed in commit R as compared to commit Q), then move back to commit Q and start looking for path/to/directory/file.ext.

From this single detected rename, plus the log messages in Q and R, you—a smart human rather than a dumb Git program that just obeys really simple rules—can conclude that, aha, all of those files came from commit H.

This is where you may want a real merge. Instead of just copying the files from H, you can literally make commit Q as a merge commit, connecting the history from commit Q back to both commits: P and H. That is, suppose you end up with:

...--F--G--H   <-- master
 \          \
  N--O--P----Q--R   <-- branch

Now when Git walks through commit history, it goes: R, Q, H-and-P, G-and-O, F-and-N, and so on. That is, git log walks through the actual history, one commit at a time, with a kind of complicated method of tracking through the fork in history where commits H and P merge to form commit Q.

The drawback to doing this merge is kind of obvious: it's a merge. It will by default bring in all the changes since some common ancestor—since whatever commit comes before N and before F where branch and master eventually lead back to a shared commit: a commit that's on both branches. You don't necessarily have to commit those changes, or even any changes: you can make commit Q's snapshot match commit P's, except of course for the new directory that you want.

(There are multiple ways to make this merge. How to achieve it is another StackOverflow question entirely, one that's already well answered. See (Git Merging) When to use 'ours' strategy, 'ours' option and 'theirs' option?, and also VonC's answer to a different question here. There are many options here but you probably would want to start with git merge -s ours --no-commit, if you want -s ours at all, and then the extraction of the files with git checkout <commit> -- <path>, and only then making commit Q as a merge.)

The advantage to the merge is that it ties the histories together, so that git log can walk from merge Q back to commit H, which is the source of actual history for the (pre-renaming) files. The disadvantage is that it ties the histories together, so that from then on, Git believes that the correct result of merging H with P is Q, even if you later change your mind about that.

If the merge isn't what you want, the commit(s) plus log message(s) may be what you need.

like image 73
torek Avatar answered Oct 31 '22 12:10

torek