I'm writing a git pre-commit hook in Python, and I'd like to define a blacklist like a .gitignore
file to check files against before processing them. Is there an easy way to check whether a file is defined against a set of .gitignore
rules? The rules are kind of arcane, and I'd rather not have to reimplement them.
Assuming you're in the directory containing the .gitignore file, one shell command will list all the files that are not ignored:
git ls-files
From python you can simply call:
import os
os.system("git ls-files")
and you can extract the list of files like so:
import subprocess
list_of_files = subprocess.check_output("git ls-files", shell=True).splitlines()
If you want to list the the files that are ignored (a.k.a, untracked), then you add the option '--other':
git ls-files --other
This is rather klunky, but should work:
.gitignore
git status --porcelain
on the resulting temporary repositoryThis does, however, smell like an XY problem. The klunky solution to Y is probably a poor solution to the real problem X.
So, you have some set of files to lint, probably from inspecting the commit. The following code may be more generic than you need (we don't really need the status
part in most cases) but I include it for illustration:
import subprocess
proc = subprocess.Popen(['git',
'diff-index', # use plumbing command, not user diff
'--cached', # compare index vs HEAD
'-r', # recurse into subdirectories
'--name-status', # show status & pathname
# '--diff-filter=AM', # optional: only A and M files
'-z', # use machine-readable output
'HEAD'], # the commit to compare against
stdout=subprocess.PIPE)
text = proc.stdout.read()
status = proc.wait()
# and check for failure as usual: Git returns 0 on success
Now we need something like pairwise
from Iterating over every two elements in a list:
import sys
if sys.version_info[0] >= 3:
izip = zip
else:
from itertools import izip
def pairwise(it):
"s -> (s0, s1), (s2, s3), (s4, s5), ..."
a = iter(it)
return izip(a, a)
and we can break up the git status
output with:
for state, path in pairwise(text.split(b'\0')):
...
We now have a state (b'A'
= added, b'M'
= modified, and so on) for each file. (Be sure to check for state T
if you allow symlinks, in case a file changes from ordinary file to symlink, or vice versa. Note that we're depending on pairwise
to discard the unpaired empty b''
string at the end of text.split(b'\0')
, which is there because Git produces a NUL-terminated list rather than a NUL-separated list.)
Let's assume that at some point we collect up the files-to-maybe-lint into a list (or iterable) called candidates
:
>>> candidates
[b'a.py', b'dir/b.py', b'z.py']
I will assume that you have avoided putting .gitignore
into this list-or-iterable, since we plan to take it over for our own purposes.
Now we have two big problems: ignoring some files, and getting the version of those files that will actually be linted.
Just because a file is listed as modified, doesn't mean that the version in the work-tree is the version that will be committed. For instance:
$ git status
$ echo foo >> README
$ git add README
$ echo bar >> README
$ git status --short
MM README
The first M
here means that the index version differs from HEAD
(this is what we got from git diff-index
above) while the second M
here means that the index version also differs from the work-tree version.
The version that will be committed is the index version, not the work-tree version. What we need to lint is not the work-tree version but rather the index version.
So, now we need a temporary directory. The thing to use here is tempfile.mkdtemp
if your Python is old, or the fancified context manager version if not. Note that we have byte-string pathnames above when working with Python3, and ordinary (string) pathnames when working with Python2, so this also is version dependent.
Since this is ordinary Python, not tricky Git interaction, I leave this part as an exercise—and I'll just gloss right over all the bytes-vs-strings pathname stuff. :-) However, for the --stdin -z
bit below, note that Git will need the list of file names as b\0
-separated bytes.
Once we have the (empty) temporary directory, in a format suitable for passing to cwd=
in subprocess.Popen
, we now need to run git checkout-index
. There are a few options but let's go this way:
import os
proc = subprocess.Popen(['git', 'rev-parse', '--git-dir'],
stdout=subprocess.PIPE)
git_dir = proc.stdout.read().rstrip(b'\n')
status = proc.wait()
if status:
raise ...
if sys.version_info[0] >= 3: # XXX ugh, but don't want to getcwdb etc
git_dir = git_dir.decode('utf8')
git_dir = os.path.join(os.getcwd(), git_dir)
proc = subprocess.Popen(['git',
'--git-dir={}'.format(git_dir),
'checkout-index', '-z', '--stdin'],
stdin=subprocess.PIPE, cwd=tmpdir)
proc.stdin.write(b'\0'.join(candidates))
proc.stdin.close()
status = proc.wait()
if status:
raise ...
Now we want to write our special ignore file to os.path.join(tmpdir, '.gitignore')
. Of course we also need tmpdir
to act like its own Git repository now. These three things will do the trick:
import shutil
subprocess.check_call(['git', 'init'], cwd=tmpdir)
shutil.copy(os.path.join(git_dir, '.pylintignore'),
os.path.join(tmpdir, '.gitignore'))
subprocess.check_call(['git', 'add', '-A'], cwd=tmpdir)
as we will now be using Git's ignore rules with the .pylintignore
file we copied to .gitignore
.
Now we just would need one more git status
pass (with -z
for b'\0' style output, like
git diff-index`) to deal with ignored files; but there's a simpler method. We can get Git to remove all the non-ignored files:
subprocess.check_call(['git', 'clean', '-fqx'], cwd=tmpdir)
shutil.rmtree(os.path.join(tmpdir, '.git'))
os.remove(os.path.join(tmpdir, '.gitignore')
and now everything in tmpdir
is precisely what we should lint.
Caveat: if your python linter needs to see imported code, you won't want to remove files. Instead, you'll want to use git status
or git diff-index
to compute the ignored files. Then you'll want to repeat the git checkout-index
, but with the -a
option, to extract all files into the temporary directory.
Once done, just remove the temp directory as usual (always clean up after yourself!).
Note that some parts of the above are tested piecewise, but assembling it all into full working Python2 or Python3 code remains an exercise.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With