Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When should I use "git push --force-if-includes"

Tags:

git

git-push

When I want to force push, I nearly always use --force-with-lease. Today I upgraded to Git 2.30 and discovered a new option: --force-if-includes.

After reading the updated documentation, it's still not entirely clear to me under which circumstances I would use --force-if-includes instead of --force-with-lease like I usually do.

like image 518
TTT Avatar asked Nov 29 '22 21:11

TTT


1 Answers

The --force-if-includes option is, as you've noted, new. If you've never needed it before, you don't need it now. So the shortest answer to "when should I use this" would be "never". 🙃 The recommended answer is (or will be once it's proven?) always. (I'm not yet convinced one way or the other, myself.)

A blanket "always" or "never" is not very useful though. Let's look at where you might want to use it. It is, strictly speaking, never necessary because all it does is modify --force-with-lease slightly. So we already have --force-with-lease in effect, if --force-if-includes is going to be used.1 Before we look at --force-with-includes we should cover how --force-with-lease actually works. What problem are we trying to solve? What are our "use cases" or "user stories" or whatever the latest buzzwords might be when someone is reading this later?

(Note: if you're already familiar with all of this, you can search for the next force-if-includes string to skip the next few sections, or just jump to the bottom and then scroll up to the section header.)

The fundamental problem we have here is one of atomicity. Git is, in the end, mostly—or at least significantly—a database, and any good database has four properties for which we have the mnemonic ACID: Atomicity, Consistency, Isolation, and Durability. Git doesn't exactly achieve any or all of these on its own: for instance, for the Durability property, it relies (at least partly) on the OS to provide it. But three of these—the C, I, and D ones—are local within a Git repository in the first place: if your computer crashes, your copy of the database may or may not be intact, recoverable, or whatever, depending on the state of your own hardware and OS.

Git is not, however, just a local database. It's a distributed one, distributed via replication, and its unit of atomicity—the commit—is spread out across multiple replications of the database. When we make a new commit locally, we can send it to some other copy or copies of the database, using git push. Those copies will try to provide their own ACID behavior, locally on those computers. But we'd like to preserve atomicity during the push itself.

We can get this in several ways. One way is to start with the idea that every commit has a globally (or universally) unique identifier: a GUID or UUID.2 (I'll use the UUID form here.) I can safely give you a new commit I've made as long as we both agree that it gets the UUID I gave it, that you didn't have.

But, while Git does use these UUIDs to find the commits, Git also needs to have a name for the commit—well, for the last commit in some chain. This guarantees that whoever is using the repository has a way to find the commit: the name finds the last one in some chain, from which we find all the earlier ones in the same chain.

If we both use the same name, we have a problem. Let's say we're using the name main to find commit b789abc, and they're using it to find the commit a123456.

The solution we use with git fetch here is simple: we assign a name to their Git repository, e.g., origin. Then, when we get some new commit(s) from them, we take their name—the one that finds the last of these commits in some chain, that is—and rename it. If they used the name main to find that tip commit, we rename that to origin/main. We create or update our own origin/main to remember their commits, and it does not mess with our own main.

But, when we're going the other way—pushing our commits to them—Git doesn't apply this idea. Instead, we ask them to update their main directly. We hand over commit b789abc for instance, and then ask them to set their main to b789abc. What they do, to make sure that they don't lose their a123456 commit, is make sure that a123456 is part of the history of our commit b789abc:

  ... <-a123456 <-b789abc   <--main

Since our main points to b789abc, and b789abc has a123456 as its parent, then having them update their main to point to b789abc is "safe". For this to really be safe, they have to atomically replace their main, but we just leave that up to them.

This method of adding commits to some remote Git repository works fine. What doesn't work is the case where we'd like to remove their a123456. We find there is something wrong or bad with a123456. Instead of making a simple correction, b789abc, that adds on to the branch, we make our b789abc so that it bypasses the bad commit:

... <-something <-a123456   <--main

becomes:

... <-something <-b789abc   <--main
               \
                a123456   ??? [no name, hence abandoned]

We then try to send this commit to them, and they reject our attempt with the gripe that it's not a "fast-forward". We add --force to tell them to do the replacement anyway, and—if we have appropriate permissions3—their Git obeys. This effectively drops the bad commit from their clone, just as we dropped it from ours.4


1As the documentation you linked notes, --force-if-includes without --force-with-lease is just ignored. That is, --force-if-includes doesn't turn on --force-with-lease for you: you have to specify both.

2These are the hash IDs, and they need to be unique across all Gits that will ever meet and share IDs, but not across two Gits that never meet. There, we can safely have what I call "doppelgängers": commits or other internal objects with the same hash ID, but different content. Still, it's best to just make them truly unique.

3Git as it is, "out of the box", does not have this kind of permissions checking, but hosting providers like GitHub and Bitbucket add it, as part of their value-adding thing to convince us to use their hosting systems.

4The un-find-able commit doesn't actually go away right away. Instead, Git leaves this for a later housekeeping git gc operation. Also, dropping a commit from some name may still leave that commit reachable from other names, or via log entries that Git keeps for each name. If so, the commit will stick around longer, perhaps even forever.


So far so good, but ...

The concept of a force-push is fine as far as it goes, but that's not far enough. Suppose we have a repository, hosted somewhere (GitHub or whatever), that receives git push requests. Suppose further that we are not the only person / group doing pushes.

We git push some new commit, then discover it's bad and want to replace it with a new and improved commit immediately, so we take a few seconds or minutes—however long it takes to make the new improved commit—and get that in place and run git push --force. For concreteness, let's say this whole thing takes us one minute, or 60 seconds.

That's sixty seconds during which someone else might:5

  • fetch our bad commit from the hosting system;
  • add a new commit of their own; and
  • git push the result.

So at this point, we think the hosting system has:

...--F--G--H   <-- main

where commit H is bad and needs replacement with our new-and-improved H'. But in fact, they now have:

...--F--G--H--I   <-- main

where commit I is from this other faster committer. Meanwhile, we now have, in our repository, the sequence:

...--F--G--H'  <-- main
         \
          H   ???

where H is our bad commit, that we're about to replace. We now run git push --force and since we are allowed to force-push, the hosting provider Git accepts our new H' as the last commit in their main, so that they now have:

...--F--G--H'  <-- main
         \
          H--I   ???

The effect is that our git push --force removed not only our bad H, but their (presumably still good, or at least, wanted) I.


5They might do this by rebasing a commit they'd already made, after finding their own git push blocked because they had based their commit on G originally. Their rebase automatically copied their new commit to the one we're calling I here, with no merge conflicts, enabling them to run git push in fewer seconds than it took us to make our fixed-up commit H'.


Enter --force-with-lease

The --force-with-lease option, which internally Git calls a "compare and swap", allows us to send a commit to some other Git, and then have them check that their branch name—whatever it is—contains the hash ID that we think it contains.

Let's add, to our drawing of our own repository, the origin/* names. Since we sent commit H to the hosting provider earlier, and they took it, we actually have this in our repository:

...--F--G--H'  <-- main
         \
          H   <-- origin/main

When we use git push --force-with-lease, we have the option of controlling this --force-with-lease completely and exactly. The complete syntax for doing this is:

git push --force-with-lease=refs/heads/main:<hash-of-H> origin <hash-of-H'>:refs/heads/main

That is, we'll:

  • send to origin commits ending with the one found via hash ID H';
  • ask them to update their name refs/heads/main (their main branch); and
  • ask them to force this update, but only if their refs/heads/main currently has in it the hash ID of commit H.

This gives us a chance to catch the case where some commit I has been added to their main. They, using the --force-with-lease=refs/heads/main:<hash> part, check their refs/heads/main. If it's not the given <hash>, they refuse the entire transaction, keeping their database intact: they retain commits I and H, and drop our new commit H' on the floor.6

The overall transaction—the forced-with-lease update of their main—has locking inserted so that if someone else is attempting to push some commit (perhaps I) now, the someone-else gets held off until we finish—fail or succeed—with our --force-with-lease operation.

We usually don't spell all this out, though. Usually we would just run:

git push --force-with-lease origin main

Here, main provides both the hash ID of the last commit we want sent—H'—and the ref-name we want them to update (refs/heads/main, based on the fact that our main is a branch name). The --force-with-lease has no = part so Git fills in the rest: the ref name is the one we want them to update—refs/heads/main—and the expected commit is the one in our corresponding remote-tracking name, i.e., the one in our own refs/remotes/origin/main.

This all comes out the same: our origin/main provides the H hash, and our main provides the H' hash and all the other names. It's shorter and does the trick.


6This depends on their Git having the "quarantine" feature in it, but anyone who has force-with-lease has this feature, I think. The quarantine feature dates back quite a while. Really-old versions of Git that lack the quarantine feature can leave the pushed commits around until a git gc collects them, even if they've never been incorporated.


This finally brings us to --force-if-includes

The example use case with --force-with-lease above shows how we replace a bad commit we made, when we figured that out ourselves. All we did was replace it and push. But this isn't how people always work.

Suppose we make a bad commit, exactly as before. We wind up in this situation in our own local repository:

...--F--G--H'  <-- main
         \
          H   <-- origin/main

But now we run git fetch origin. Perhaps we're trying to be conscientious; perhaps we're under stress and making mistakes. Whatever is going on, we now get:

...--F--G--H'  <-- main
         \
          H--I   <-- origin/main

in our own repository.

If we use git push --force-with-lease=main:<hash-of-H> origin main, the push will fail—like it should—because we explicitly state that we expect origin's main to contain hash ID H. As we can see from our git fetch, though, it actually has hash ID I. If we use the simpler:

git push --force-with-lease origin main

we'll ask the hosting-provider Git to swap out their main for commit H' if they have commit I as their last commit. Which, as we can see, they did: we got commit I into our repository. We just forgot to put it in.

So, our force-with-lease works and we wipe out commit I over on origin, all because we ran git fetch and forgot to check the result. The --force-if-includes option is intended to catch these cases.

How it actually works is that it depends on Git's reflogs. It scans your own reflog for your main branch, and picks out commit H rather than I, to be used as the hash ID in --force-with-lease. This is similar to the fork-point mode for git rebase (though that one uses your remote-tracking reflog). I'm not 100% convinced, myself, that this --force-if-includes option is going to work in all cases: --fork-point does not, for instance. But it does work in most cases, and I suspect --force-if-includes will too.

So, you can try it out by using it for all --force-with-lease pushes. All it does is use a different algorithm—one the Git folks are hoping will be more reliable, given the way humans are—to pick the hash ID for the atomic "swap out your branch name if this matches" operation that --force-with-lease uses. You can do this manually by providing the =<refname>:<hash> part of --force-with-lease, but the goal is to do it automatically, in a safer way than the current automatic way.

like image 160
torek Avatar answered Dec 09 '22 09:12

torek