Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there an easy way to match files against .gitignore rules?

I'm writing a git pre-commit hook in Python, and I'd like to define a blacklist like a .gitignore file to check files against before processing them. Is there an easy way to check whether a file is defined against a set of .gitignore rules? The rules are kind of arcane, and I'd rather not have to reimplement them.

like image 906
Chris B. Avatar asked Dec 13 '16 21:12

Chris B.


2 Answers

Assuming you're in the directory containing the .gitignore file, one shell command will list all the files that are not ignored:

git ls-files

From python you can simply call:

import os
os.system("git ls-files")

and you can extract the list of files like so:

import subprocess
list_of_files = subprocess.check_output("git ls-files", shell=True).splitlines()

If you want to list the the files that are ignored (a.k.a, untracked), then you add the option '--other':

git ls-files --other
like image 175
Eddy Avatar answered Oct 19 '22 21:10

Eddy


This is rather klunky, but should work:

  • create a temporary git repository
  • populate it with your proposed .gitignore
  • also populate it with one file per pathname
  • use git status --porcelain on the resulting temporary repository
  • empty it out (remove it entirely, or preserve it as empty for the next pass, whichever seems more appropriate).

This does, however, smell like an XY problem. The klunky solution to Y is probably a poor solution to the real problem X.

Post-comment answer with details (and side notes)

So, you have some set of files to lint, probably from inspecting the commit. The following code may be more generic than you need (we don't really need the status part in most cases) but I include it for illustration:

import subprocess

proc = subprocess.Popen(['git',
     'diff-index',                        # use plumbing command, not user diff
     '--cached',                          # compare index vs HEAD
     '-r',                                # recurse into subdirectories
     '--name-status',                     # show status & pathname
     # '--diff-filter=AM',                # optional: only A and M files
     '-z',                                # use machine-readable output
     'HEAD'],                             # the commit to compare against
     stdout=subprocess.PIPE)
text = proc.stdout.read()
status = proc.wait()
# and check for failure as usual: Git returns 0 on success

Now we need something like pairwise from Iterating over every two elements in a list:

import sys

if sys.version_info[0] >= 3:
    izip = zip
else:
    from itertools import izip
def pairwise(it):
    "s -> (s0, s1), (s2, s3), (s4, s5), ..."
    a = iter(it)
    return izip(a, a)

and we can break up the git status output with:

for state, path in pairwise(text.split(b'\0')):
    ...

We now have a state (b'A' = added, b'M' = modified, and so on) for each file. (Be sure to check for state T if you allow symlinks, in case a file changes from ordinary file to symlink, or vice versa. Note that we're depending on pairwise to discard the unpaired empty b'' string at the end of text.split(b'\0'), which is there because Git produces a NUL-terminated list rather than a NUL-separated list.)

Let's assume that at some point we collect up the files-to-maybe-lint into a list (or iterable) called candidates:

>>> candidates
[b'a.py', b'dir/b.py', b'z.py']

I will assume that you have avoided putting .gitignore into this list-or-iterable, since we plan to take it over for our own purposes.

Now we have two big problems: ignoring some files, and getting the version of those files that will actually be linted.

Just because a file is listed as modified, doesn't mean that the version in the work-tree is the version that will be committed. For instance:

$ git status
$ echo foo >> README
$ git add README
$ echo bar >> README
$ git status --short
MM README

The first M here means that the index version differs from HEAD (this is what we got from git diff-index above) while the second M here means that the index version also differs from the work-tree version.

The version that will be committed is the index version, not the work-tree version. What we need to lint is not the work-tree version but rather the index version.

So, now we need a temporary directory. The thing to use here is tempfile.mkdtemp if your Python is old, or the fancified context manager version if not. Note that we have byte-string pathnames above when working with Python3, and ordinary (string) pathnames when working with Python2, so this also is version dependent.

Since this is ordinary Python, not tricky Git interaction, I leave this part as an exercise—and I'll just gloss right over all the bytes-vs-strings pathname stuff. :-) However, for the --stdin -z bit below, note that Git will need the list of file names as b\0-separated bytes.

Once we have the (empty) temporary directory, in a format suitable for passing to cwd= in subprocess.Popen, we now need to run git checkout-index. There are a few options but let's go this way:

import os

proc = subprocess.Popen(['git', 'rev-parse', '--git-dir'],
    stdout=subprocess.PIPE)
git_dir = proc.stdout.read().rstrip(b'\n')
status = proc.wait()
if status:
    raise ...
if sys.version_info[0] >= 3:  # XXX ugh, but don't want to getcwdb etc
    git_dir = git_dir.decode('utf8')
git_dir = os.path.join(os.getcwd(), git_dir)

proc = subprocess.Popen(['git',
    '--git-dir={}'.format(git_dir),
    'checkout-index', '-z', '--stdin'],
    stdin=subprocess.PIPE, cwd=tmpdir)
proc.stdin.write(b'\0'.join(candidates))
proc.stdin.close()
status = proc.wait()
if status:
    raise ...

Now we want to write our special ignore file to os.path.join(tmpdir, '.gitignore'). Of course we also need tmpdir to act like its own Git repository now. These three things will do the trick:

import shutil

subprocess.check_call(['git', 'init'], cwd=tmpdir)
shutil.copy(os.path.join(git_dir, '.pylintignore'),
    os.path.join(tmpdir, '.gitignore'))
subprocess.check_call(['git', 'add', '-A'], cwd=tmpdir)

as we will now be using Git's ignore rules with the .pylintignore file we copied to .gitignore.

Now we just would need one more git status pass (with -z for b'\0' style output, likegit diff-index`) to deal with ignored files; but there's a simpler method. We can get Git to remove all the non-ignored files:

subprocess.check_call(['git', 'clean', '-fqx'], cwd=tmpdir)
shutil.rmtree(os.path.join(tmpdir, '.git'))
os.remove(os.path.join(tmpdir, '.gitignore')

and now everything in tmpdir is precisely what we should lint.

Caveat: if your python linter needs to see imported code, you won't want to remove files. Instead, you'll want to use git status or git diff-index to compute the ignored files. Then you'll want to repeat the git checkout-index, but with the -a option, to extract all files into the temporary directory.

Once done, just remove the temp directory as usual (always clean up after yourself!).

Note that some parts of the above are tested piecewise, but assembling it all into full working Python2 or Python3 code remains an exercise.

like image 1
torek Avatar answered Oct 19 '22 20:10

torek