How do i search directories and find files that match regex?

Question

I recently started getting into Python and I am having a hard time searching through directories and matching files based on a regex that I have created.

Basically I want it to scan through all the directories in another directory and find all the files that ends with .zip or .rar or .r01 and then run various commands based on what file it is.

import os, re

rootdir = "/mnt/externa/Torrents/completed"

for subdir, dirs, files in os.walk(rootdir):
    if re.search('(w?.zip)|(w?.rar)|(w?.r01)', files):
        print "match: " . files

Jonas · Accepted Answer

import os
import re

rootdir = "/mnt/externa/Torrents/completed"
regex = re.compile('(.*zip$)|(.*rar$)|(.*r01$)')

for root, dirs, files in os.walk(rootdir):
  for file in files:
    if regex.match(file):
       print(file)

CODE BELLOW ANSWERS QUESTION IN FOLLOWING COMMENT

That worked really well, is there a way to do this if match is found on regex group 1 and do this if match is found on regex group 2 etc ? – nillenilsson

import os
import re

regex = re.compile('(.*zip$)|(.*rar$)|(.*r01$)')
rx = '(.*zip$)|(.*rar$)|(.*r01$)'

for root, dirs, files in os.walk("../Documents"):
  for file in files:
    res = re.match(rx, file)
    if res:
      if res.group(1):
        print("ZIP",file)
      if res.group(2):
        print("RAR",file)
      if res.group(3):
        print("R01",file)

It might be possible to do this in a nicer way, but this works.

Avi Vajpeyi · Answer

Given that you are a beginner, I would recommend using glob in place of a quickly written file-walking-regex matcher.

Snippets of functions using `glob` and a `file-walking-regex matcher`

The below snippet contains two file-regex searching functions (one using glob and the other using a custom file-walking-regex matcher). The snippet also contains a "stopwatch" function to time the two functions.

import os
import sys
from datetime import timedelta
from timeit import time
import os
import re
import glob

def stopwatch(method):
    def timed(*args, **kw):
        ts = time.perf_counter()
        result = method(*args, **kw)
        te = time.perf_counter()
        duration = timedelta(seconds=te - ts)
        print(f"{method.__name__}: {duration}")
        return result
    return timed

@stopwatch
def get_filepaths_with_oswalk(root_path: str, file_regex: str):
    files_paths = []
    pattern = re.compile(file_regex)
    for root, directories, files in os.walk(root_path):
        for file in files:
            if pattern.match(file):
                files_paths.append(os.path.join(root, file))
    return files_paths


@stopwatch
def get_filepaths_with_glob(root_path: str, file_regex: str):
    return glob.glob(os.path.join(root_path, file_regex))

Comparing runtimes of the above functions

On using the above two functions to find 5076 files matching the regex filename_*.csv in a dir called root_path (containing 66,948 files):

>>> glob_files = get_filepaths_with_glob(root_path, 'filename_*.csv')
get_filepaths_with_glob: 0:00:00.176400

>>> oswalk_files = get_filepaths_with_oswalk(root_path,'filename_(.*).csv')
get_filepaths_with_oswalk: 0:03:29.385379

The glob method is much faster and the code for it is shorter.

For your case

For your case, you can probably use something like the following to get your *.zip,*.rar and *.r01 files:

files = []
for ext in ['*.zip', '*.rar', '*.r01']:
    files += get_filepaths_with_glob(root_path, ext)

How do i search directories and find files that match regex?

Tags:

python

regex

file

linux

directory

nillenilsson

2 Answers

Jonas

Snippets of functions using `glob` and a `file-walking-regex matcher`

Comparing runtimes of the above functions

For your case

Avi Vajpeyi

Recent Activity

Donate For Us

How do i search directories and find files that match regex?

Tags:

python

regex

file

linux

directory

nillenilsson

2 Answers

Jonas

Snippets of functions using glob and a file-walking-regex matcher

Comparing runtimes of the above functions

For your case

Avi Vajpeyi

Related questions

Recent Activity

Donate For Us

Snippets of functions using `glob` and a `file-walking-regex matcher`