Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python os.walk to certain level [duplicate]

I want to build a program that uses some basic code to read through a folder and tell me how many files are in the folder. Here is how I do that currently:

import os

folders = ['Y:\\path1', 'Y:\\path2', 'Y:\\path3']
for stuff in folders:
    for root, dirs, files in os.walk(stuff, topdown=True):
        print("there are", len(files), "files in", root)

This works great until there are multiple folders inside the "main" folder as it can return a long, junky list of files due to poor folder/file management. So I would like to go only to the second level at most. example:

Main Folder
---file_i_want
---file_i_want
---Sub_Folder
------file_i_want <--*
------file_i want <--*
------Sub_Folder_2
---------file_i_dont_want
---------file_i_dont_want

I know how to go to only the first level with a break and with del dirs[:] taken from this post and also this post.

import os
import pandas as pd

folders = ['Y:\\path1', 'Y:\\path2', 'Y:\\path3']
for stuff in folders:
    for root, dirs, files in os.walk(stuff, topdown=True):
        print("there are", len(files), "files in", root)
        del dirs[:] # or a break here. does the same thing.

But no matter my searching I can't find out how to go two layers deep. I may just not be understanding the other posts on it or something? I was thinking something like del dirs[:2] but to no avail. Can someone guide me or explain to mehow to accomplish this?

like image 832
MattR Avatar asked Mar 10 '17 14:03

MattR


People also ask

What is OS Walk ()?

OS.walk() generate the file names in a directory tree by walking the tree either top-down or bottom-up. For each directory in the tree rooted at directory top (including top itself), it yields a 3-tuple (dirpath, dirnames, filenames). root : Prints out directories only from what you specified.

What does OS walk return in Python?

os. walk() returns a list of three items. It contains the name of the root directory, a list of the names of the subdirectories, and a list of the filenames in the current directory.

What is OS SEP in Python?

The os. sep indicates the character used by the operating system to separate pathname components. The value for os. sep is / for POSIX and \\ for Windows.


1 Answers

you could do like this:

depth = 2

# [1] abspath() already acts as normpath() to remove trailing os.sep
#, and we need ensures trailing os.sep not exists to make slicing accurate. 
# [2] abspath() also make /../ and ////, "." get resolved even though os.walk can returns it literally.
# [3] expanduser() expands ~
# [4] expandvars() expands $HOME
# WARN: Don't use [3] expanduser and [4] expandvars if stuff contains arbitrary string out of your control.
#stuff = os.path.expanduser(os.path.expandvars(stuff)) # if trusted source
stuff = os.path.abspath(stuff)

for root,dirs,files in os.walk(stuff):
    if root[len(stuff):].count(os.sep) < depth:
        for f in files:
            print(os.path.join(root,f))

key is: if root[len(stuff):].count(os.sep) < depth

It removes stuff from root, so result is relative to stuff. Just count the number of files separators.

The depth acts like find command found in Linux, i.e. -maxdepth 0 means do nothing, -maxdepth 1 only scan files in first level, and -maxdepth 2 scan files included sub-directory.

Of course, it still scans the full file structure, but unless it's very deep that'll work.

Another solution would be to only use os.listdir recursively (with directory check) with a maximum recursion level, but that's a little trickier if you don't need it. Since it's not that hard, here's one implementation:

def scanrec(root):
    rval = []

    def do_scan(start_dir,output,depth=0):
        for f in os.listdir(start_dir):
            ff = os.path.join(start_dir,f)
            if os.path.isdir(ff):
                if depth<2:
                    do_scan(ff,output,depth+1)
            else:
                output.append(ff)

    do_scan(root,rval,0)
    return rval

print(scanrec(stuff))  # prints the list of files not below 2 deep

Note: os.listdir and os.path.isfile perform 2 stat calls so not optimal. In Python 3.5, the use of os.scandir could avoid that double call.

like image 161
Jean-François Fabre Avatar answered Sep 24 '22 15:09

Jean-François Fabre