Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

os.walk() never returns when asked to print dirpaths

Tags:

python

os.walk

I have a simple directory structure:

rootdir\
    subdir1\
        file1.tif
    subdir2\
        file2.tif
    ...
    subdir13\
        file13.tif
    subdir14\
        file14.tif

If I call:

import os

print os.listdir('absolute\path\to\rootdir')

...then I get what you'd expect:

['subdir1', 'subdir2', ... 'subdir13', 'subdir14']

Same thing happens if I call os.listdir() on those sub-directories. For each one it returns the name of the file in that directory. No problems there.

And if I call:

import os

for dirpath, dirnames, filenames in os.walk('absolute\path\to\rootdir'):
    print filenames
    print dirnames

...then I get what you'd expect:

[]
['subdir1', 'subdir2', ... 'subdir13', 'subdir14']
['file1.tif']
[]
['file2.tif']
[]
...

But here's the strangeness. When I call:

import os

for dirpath, dirnames, filenames in os.walk('absolute\path\to\rootdir'):
    print filenames
    print dirnames
    print dirpath

...it never returns, ever. Even if I try:

print [each[0] for each in os.walk('absolute\path\to\roodir')]

...or anything of the sort. I can always print the second and third parts of the tuple returned by os.walk(), but the moment that I try to touch the first part the whole thing just stops.

Even stranger, this behavior only appears in scripts launched using the shell. The command line interpreter acts normally. I'm curious, what's going on here?

-----EDIT----- Actual code:

ALLOWED_IMGFORMATS = [".jpg",".tif"]

def getCategorizedFiles(pathname):
    cats = [each[0] for each in os.walk(pathname) if not each[0] == pathname]
    ncats = len(cats)
    tree = [[] for i in range(ncats+1)]
    for cat in cats:
        catnum = int(os.path.basename(cat))
        for item in os.listdir(cat):
            if not item.endswith('.sift') and os.path.splitext(item)[-1].lower() in ALLOWED_IMGFORMATS:
                tree[catnum].append(cat + '\\' + item)
    fileDict = {cat : tree[cat] for cat in range(1,ncats+1)}
    return fileDict

----EDIT 2---- Another development. As stated above, this problem exists when the code is in scripts launched from the shell. But not any shell. The problem exists with Console 2, but not the Windows command prompt. It also exists when the script is launched from java (how I originally came across the problem) like so: http://www.programmersheaven.com/mb/python/415726/415726/invoking-python-script-from-java/?S=B20000

like image 697
ciph345 Avatar asked Nov 11 '22 23:11

ciph345


1 Answers

I've never really trusted os.walk(). Just write your own recursive stuff. It's not hard:

def contents(folder, l): # Recursive, returns list of all files with full paths
    directContents = os.listdir(folder)
    for item in directContents:
        if os.path.isfile(os.path.join(folder, item)):
            l.append(os.path.join(folder, item))
        else:contents(os.path.join(folder, item), l)
    return l
contents = contents(folder, [])

contents will then be a list of all the files with full paths included. You can use os.split() if you like to make it a little easier to read.

Knowing how this works eliminates the uncertainty of using os.walk() in your code, which means you'll be able to identify if the problem in your code is really involved with os.walk().

If you need to put them in a dictionary (because dictionaries have aliasing benefits, too), you can also sort your files that way.

like image 72
user2569332 Avatar answered Nov 14 '22 22:11

user2569332