I need to list all files in the current directory (.) (including all sub directories), and exclude some files as how .gitignore works (http://git-scm.com/docs/gitignore)
With fnmatch (https://docs.python.org/2/library/fnmatch.html) I will be able to "filter" files using a pattern
ignore_files = ['*.jpg', 'foo/', 'bar/hello*']
matches = []
for root, dirnames, filenames in os.walk('.'):
for filename in fnmatch.filter(filenames, '*'):
matches.append(os.path.join(root, filename))
how can I "filter" and get all files which doesn't match with one or more element of my "ignore_files"?
Thanks!
The . gitignore file is a text file that tells Git which files or folders to ignore in a project. A local . gitignore file is usually placed in the root directory of a project.
If you want to ignore a file that you've committed in the past, you'll need to delete the file from your repository and then add a . gitignore rule for it. Using the --cached option with git rm means that the file will be deleted from your repository, but will remain in your working directory as an ignored file.
No, gitignore doesn't support regex es, it only supports unix fnmatch style patterns.
You're on the right track: If you want to use fnmatch
-style patterns, you should use fnmatch.filter
with them.
But there are three problems that make this not quite trivial.
First, you want to apply multiple filters. How do you do that? Call filter
multiple times:
for ignore in ignore_files:
filenames = fnmatch.filter(filenames, ignore)
Second, you actually want to do the reverse of filter
: return the subset of names that don't match. As the documentation explains:
It is the same as
[n for n in names if fnmatch(n, pattern)]
, but implemented more efficiently.
So, to do the opposite, you just throw in a not
:
for ignore in ignore_files:
filenames = [n for n in filenames if not fnmatch(n, ignore)]
Finally, you're attempting to filter on partial pathnames, not just filenames, but you're not doing the join
until after the filtering. So switch the order:
filenames = [os.path.join(root, filename) for filename in filenames]
for ignore in ignore_files:
filenames = [n for n in filenames if not fnmatch(n, ignore)]
matches.extend(filenames)
There are few ways you could improve this.
You may want to use a generator expression instead of a list comprehension (parentheses instead of square brackets), so if you have huge lists of filenames you're using a lazy pipeline instead of wasting time and space repeatedly building huge lists.
Also, it may or may not be easier to understand if you invert the order of the loops, like this:
filenames = (n for n in filenames
if not any(fnmatch(n, ignore) for ignore in ignore_files))
Finally, if you're worried about performance, you can use fnmatch.translate
on each expression to turn them into equivalent regexps, then merge them into one big regexp and compile it, and use that instead of a loop around fnmatch
. This can get tricky if your patterns are allowed to be more complicated than just *.jpg
, and I wouldn't recommend it unless you really do identify a performance bottleneck here. But if you need to do it, I've seen at least one question on SO where someone put a lot of effort into hammering out all the edge cases, so search instead of trying to write it yourself.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With