Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Git pre-commit hook + docker = different git status

I want to run a script inside a pre-commit git hook. I want that script to run from within a docker image. Example code of this pre-commit hook:

# pre-commit hook
#!/bin/bash
repo_root=$(git rev-parse --show-toplevel)
docker run -v ${repo_root}:${repo_root} -w ${repo_root}  <my_docker_image> <path_to_my_script.py>

my_script.py internally runs git status to determine which files to process in the pre-commit hook.

Problem: the output of git status is different in the pre-commit hook than inside the docker container, when I run git commit --all. Example:

# pre-commit hook
#!/bin/bash
git status
echo "------------------------------------"
repo_root=$(git rev-parse --show-toplevel)
docker run -v ${repo_root}:${repo_root} -w ${repo_root} <my_docker_image> git status

I would expect that by running git commit --all, running git status inside the docker container I could see all the changes staged.

However the changes are not staged inside the docker container. The code that I wrote previously prints the following:

Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

    modified:   tools/git-hooks/pre-commit
------------------------------------
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

    modified:   tools/git-hooks/pre-commit

In other words: inside docker, git status does NOT detect the --all option; the changes are not staged.

What am I missing?

like image 879
user1011113 Avatar asked Dec 12 '17 15:12

user1011113


1 Answers

TL;DR: add -e GIT_INDEX_FILE. However, you may want to allow additional Git variables. Alternatively, you could simply forbid such commits, or use an entirely different mechanism (see the last paragraphs in the description below).

Description

When you use git commit --all, Git creates a temporary index to hold the staged files, since they're not already staged in the normal index. This temporary index will eventually become the regular index, if the commit succeeds. Until then, it's not the regular index.

Now, your pre-commit script include the line:

docker run -v ${repo_root}:${repo_root} -w ${repo_root} ...

The -v option mounts a file system in the child process that runs the docker image, and -w sets it up as the work directory. However, the docker command filters the environment in the child process, stripping away all suspect variables not explicitly enabled with --env or --env-file (the short form of --env is -e).

The top level git documentation page has a section on environment variables that includes:

GIT_INDEX_FILE

    This environment allows the specification of an alternate index file. If not specified, the default of $GIT_DIR/index is used.

When git commit --all sets up a temporary index, it uses this environment variable to direct all of its sub-commands to look at the temporary index instead of $GIT_DIR/index.

It's worth noting that Git has also set $GIT_DIR, but it has probably set it to ., so when docker run strips it out, Git looks for the repository starting in the current working directory, which you set via -w to be the repository root—so stripping . turns out to be harmless. Nonetheless, you might want to consider passing all of Git's environment variables through. It's not as simple as blindly passing everything listed in the documentation, though: for instance, if there is a GIT_ALTERNATE_OBJECT_DIRECTORIES path that needs to be exported, you would need to mount each element within that path into the docker instance as well, using more -v options.

Fortunately, when GIT_INDEX_FILE is set at this particular point, it's set to a path name of the form .git/temporary-name, so there's no need to mount an additional file system to get the Git temporary index into the running image. This is because this name is going to be renamed .git/index and Git wants to be sure that the rename can be done as an atomic file system operation, which requires that it exist on the same mount point as .git itself. In fact, for git commit --all it's just .git/index.lock, though git commit --only uses other names: the --only form needs multiple temporary index files, and the one being used to commit is not the one that will become the normal index on success.

Finally—and this is entirely independent of Docker—note that it's possible to ask Git to commit staged files that do not match what's in the work-tree right now. For instance, with git add -p, it's easy to store, in the index, a version of a file that has only some of the differences between the HEAD version and the work-tree version. I'm guessing that you plan to have the docker environment run some sort of tests on what is to be committed. That's fine—but note that "what is to be committed" is not necessarily "what is in the work-tree". When using --all and a temporary index, it's the temporary index that contains what is to be committed, and that temporary index was just built from the work-tree, so they will match; but when not using --all, and using the real index, or when using --only with a different temporary index, "what is to be committed" won't necessarily match the work-tree.

It's tricky, but not impossible, to write a good pre-commit hook that can see "what is to be committed". One way to do it is to extract whatever is in the index, into a temporary directory, unrelated to the current work-tree. You can then run the testing system over the temporary directory, with no interference by the repository itself, and no interference by the current work-tree. If you did this in your pre-commit script, you could mount (via -v and -w) this temporary directory, and not worry about running any Git commands inside the docker image.

Side note: #! lines

Your examples may be modified for StackOverflow posting purposes, but in:

# pre-commit hook
#!/bin/bash
git status

the #! line has become useless. These lines must be the first line of a script. The reason is that the way the kernel (Linux, or the Unixes from which this is derived) runs a script like this is to inspect the first few bytes of the file. If the first two bytes are #!, the rest of the first line—everything (within certain reasonable limits) up to the first newline—is taken as the name of an interpreter, and perhaps options to that interpreter. The kernel then runs the interpreter, rather than the script, passing options from the #! line if any, then the name of the script.

If the first line does not start with #!, though, the kernel just refuses to run the file directly (execve fails with ENOEXEC). The shells discover the error, inspect the file themselves, and decide whether the file is a shell script ... and if so, the shell itself runs the file. The main issue here is that the shell may pick the wrong shell to run the file. Having a #! line with the name of the correct interpreter avoids that.

like image 54
torek Avatar answered Oct 16 '22 04:10

torek