Summary: What are the best practices for handling long running tracking of upstream repositories where you want to maintain a set of local changes?
I want to keep a fork on github up-to-date with the upstream but still allow clear tracking of changes unique to the fork. (for this discussion, assume that upstream
points to the main project repository and that origin
refers to my fork of the repository)
Imagine I have something like this where I forked a repository when upstream/master was at E.
Upstream:
A-B-C-D-E-F
Fork:
A-B-C-D-E ----- P ------T
\-L-M-/ \-Q-R-/
After forking the respository I created two feature branches (L-M and Q-R) to add new features I needed and merged them back to my origin/master. So now my branch has improvements that don't exist upstream.
I find that upstream has a couple of interesting fixes so I want to get back into sync with upstream. Based upon most references I have found (git hub fork), the recommended way to do this is to merge upstream/master into your origin/master and continue on your way. So I would issue commands like:
git checkout master
git fetch upstream
git git merge upstream/master
git push
Then I would end up with repositories that looks like this:
Upstream:
A-B-C-D-E-F
Fork:
A-B-C-D-E ----- P ------T-F'
\-L-M-/ \-Q-R-/
There are a couple of problems I see with this though.
I don't actually have commit F in my repo, I have F' which has the same content, but a different hash. So I can't easily reference commits between the two repositories and know that I have a change. (it gets even more complex when considering that upstream probably has more than one change and has it's own set of feature branches that have been merged)
As I move forward and continue doing this it becomes increasingly difficult for me to know what changes I have in my repository beyond what is in the upstream. For example I may submit some of these changes back upstream while continuing to add my own refinements. After several iterations of this, how does anyone looking at my repository know how it differs from upstream? (is there a git command to find these changes?)
Similar to #2, how would someone find a fix in upstream and check to see if my fork contains the fix?
I guess the root of the problem is there is no way for me to guarantee that my repository is in "sync" with the upstream at any given point because the code and the hashes are not the same. So how do I go about tracking the changes accurately and keep myself from going insane trying to keep things in sync?
Note: I had considered using rebase to keep rebasing my repository off upstream, but this has an entirely different set of issues. For example if anyone references my respository through submodules, branches, etc then the history rewrite will break their references. Additionally, I don't think my branch history would survive the rebase so I would not have a complete view of all the feature branches I had made and the associated history.
How do other people handle this? What are some best practices I should be looking into?
Update:
Based upon feedback from Seth, I created a set of test repositories to show what I was talking about and how it works out the way he says.
The repositories are:
They should show more clearly how merging from upstream looks when there are local changes as well.
Syncing a fork branch from the web UI On GitHub, navigate to the main page of the forked repository that you want to sync with the upstream repository. Select the Sync fork dropdown. Review the details about the commits from the upstream repository, then click Update branch.
Private forks inherit the permissions structure of the upstream or parent repository. For example, if the upstream repository is private and gives read/write access to a team, then the same team will have read/write access to any forks of the private upstream repository.
GitHub has now introduced a feature to sync a fork with the click of a button. Go to your fork, click on Fetch upstream , and then click on Fetch and merge to directly sync your fork with its parent repo. You may also click on the Compare button to compare the changes before merging.
You can check tracking branches by running the “git branch” command with the “-vv” option. We can set the upstream branch using the “git push” command. $ git push -u origin branch Total 0 (delta 0), reused 0 (delta 0) * [new branch] branch -> branch Branch 'branch' set up to track remote branch 'branch' from 'origin'.
You are incorrect in your assumption. You said in your text example that you would be running the git merge
command. If you really meant this, and not git cherry-pick
(and for the record, git-merge is the best practice in the situation) then you do NOT get F` in your branch, you get F. Perhaps a picture:
After the fetch but before the merge, your repos look like this:
Upstream:
A-B-C-D-E-F [master]
Fork: /-F [upstream/master]
A-B-C-D-E ----- P ------T [master]
\-L-M-/ \-Q-R-/ [Other branches]
After you merge, your repo will look like this:
Fork: /-F-------------\ [upstream/master]
A-B-C-D-E ----- P ------T-U [master]
\-L-M-/ \-Q-R-/ [Other branches]
New commit "U" in your repo will be a merge commit, just like commits "P" and "T".
git cherry-pick
would create "F'" as you indicated in your example. Don't do that. git rebase
can sometimes support rebasing branches git rebase -p
but it doesn't always work. Also, that is rewriting public history, which is a bad idea.
I have a document on git best practices: Commit Often, Perfect Later, Publish Once You might specifically want to investigate the workflow section for further inspiration.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With