Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filtering os.walk() dirs and files

I'm looking for a way to include/exclude files patterns and exclude directories from a os.walk() call.

Here's what I'm doing by now:

import fnmatch import os  includes = ['*.doc', '*.odt'] excludes = ['/home/paulo-freitas/Documents']  def _filter(paths):     for path in paths:         if os.path.isdir(path) and not path in excludes:             yield path          for pattern in (includes + excludes):             if not os.path.isdir(path) and fnmatch.fnmatch(path, pattern):                 yield path  for root, dirs, files in os.walk('/home/paulo-freitas'):     dirs[:] = _filter(map(lambda d: os.path.join(root, d), dirs))     files[:] = _filter(map(lambda f: os.path.join(root, f), files))      for filename in files:         filename = os.path.join(root, filename)          print(filename) 

Is there a better way to do this? How?

like image 295
Paulo Freitas Avatar asked Feb 28 '11 11:02

Paulo Freitas


People also ask

What does OS Walk () do?

OS. walk() generate the file names in a directory tree by walking the tree either top-down or bottom-up. For each directory in the tree rooted at directory top (including top itself), it yields a 3-tuple (dirpath, dirnames, filenames).

How do you filter a file type in Python?

To filter and list files according to their names with Pathlib Python Module, we need to use the “glob()” function. “glob()” function is being used for determining patterns to filter the files according to their names or extension.

What is the difference between OS Listdir () and OS walk?

listdir() method returns a list of every file and folder in a directory. os. walk() function returns a list of every file in an entire file tree.

What does OS walk return in Python?

os. walk() returns a list of three items. It contains the name of the root directory, a list of the names of the subdirectories, and a list of the filenames in the current directory.


1 Answers

This solution uses fnmatch.translate to convert glob patterns to regular expressions (it assumes the includes only is used for files):

import fnmatch import os import os.path import re  includes = ['*.doc', '*.odt'] # for files only excludes = ['/home/paulo-freitas/Documents'] # for dirs and files  # transform glob patterns to regular expressions includes = r'|'.join([fnmatch.translate(x) for x in includes]) excludes = r'|'.join([fnmatch.translate(x) for x in excludes]) or r'$.'  for root, dirs, files in os.walk('/home/paulo-freitas'):      # exclude dirs     dirs[:] = [os.path.join(root, d) for d in dirs]     dirs[:] = [d for d in dirs if not re.match(excludes, d)]      # exclude/include files     files = [os.path.join(root, f) for f in files]     files = [f for f in files if not re.match(excludes, f)]     files = [f for f in files if re.match(includes, f)]      for fname in files:         print fname 
like image 66
Oben Sonne Avatar answered Sep 21 '22 06:09

Oben Sonne