Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Avoid the base path in a recursive tree walking

I know how to list recursively all files/folders of d:\temp with various methods, see How to use glob() to find files recursively?.
But often I'd like to avoid to have the d:\temp\ prefix in the results, and have relative paths to this base instead.

This can be done with:

  • import os, glob
    for f in glob.glob('d:\\temp\\**\\*', recursive=True):
        print(os.path.relpath(f, 'd:\\temp'))
    
  • idem with f.lstrip('d:\\temp\\') which removes this prefix

  • import pathlib
    root = pathlib.Path("d:\\temp")
    print([p.relative_to(root) for p in root.glob("**/*")])
    

These 3 solutions work. But in fact if you read the source of glob.py, it does accumulate/join all the parts of the path. So the solution above is ... "removing something that was just added before"! It works, but it's not very elegant. Idem for pathlib with relative_to which removes the prefix.

Question: how to modify the next few lines to not have d:\temp in the output (without removing something that was concatenated before!)?

import os

def listpath(path):
    for f in os.scandir(path):
        f2 = os.path.join(path, f)
        if os.path.isdir(f):
            yield f2
            yield from listpath(f2)
        else:
            yield f2

for f in listpath('d:\\temp'):
    print(f)

#d:\temp\New folder
#d:\temp\New folder\New Text Document - Copy.txt
#d:\temp\New folder\New Text Document.txt
#d:\temp\New Text Document - Copy.txt
#d:\temp\New Text Document.txt
like image 736
Basj Avatar asked Dec 03 '20 21:12

Basj


1 Answers

You can do something like shown in the following example. Basically, we recursively return the path parts joining them together, but we don't join the initial root.

import os

def listpath(root, parent=''):
    scan = os.path.join(root, parent)
    for f in os.scandir(scan):
        f2 = os.path.join(parent, f.name)
        yield f2
        if f.is_dir():
            yield from listpath(root, f2)

for f in listpath('d:\\temp'):
    print(f)

In Python 3.10, which is not released yet, there will be a new root_dir option which will allow you to do this with the built-in glob with no problem:

import glob
glob.glob('**/*', root_dir='d:\\temp', recursive=True)

You could also use a 3rd party library such as the wcmatch library that has already implemented this behavior (which I am the author of). But in this simple case, your listpath approach may be sufficient.

like image 107
facelessuser Avatar answered Nov 13 '22 17:11

facelessuser