I have a simple directory structure:
rootdir\
subdir1\
file1.tif
subdir2\
file2.tif
...
subdir13\
file13.tif
subdir14\
file14.tif
If I call:
import os
print os.listdir('absolute\path\to\rootdir')
...then I get what you'd expect:
['subdir1', 'subdir2', ... 'subdir13', 'subdir14']
Same thing happens if I call os.listdir() on those sub-directories. For each one it returns the name of the file in that directory. No problems there.
And if I call:
import os
for dirpath, dirnames, filenames in os.walk('absolute\path\to\rootdir'):
print filenames
print dirnames
...then I get what you'd expect:
[]
['subdir1', 'subdir2', ... 'subdir13', 'subdir14']
['file1.tif']
[]
['file2.tif']
[]
...
But here's the strangeness. When I call:
import os
for dirpath, dirnames, filenames in os.walk('absolute\path\to\rootdir'):
print filenames
print dirnames
print dirpath
...it never returns, ever. Even if I try:
print [each[0] for each in os.walk('absolute\path\to\roodir')]
...or anything of the sort. I can always print the second and third parts of the tuple returned by os.walk(), but the moment that I try to touch the first part the whole thing just stops.
Even stranger, this behavior only appears in scripts launched using the shell. The command line interpreter acts normally. I'm curious, what's going on here?
-----EDIT----- Actual code:
ALLOWED_IMGFORMATS = [".jpg",".tif"]
def getCategorizedFiles(pathname):
cats = [each[0] for each in os.walk(pathname) if not each[0] == pathname]
ncats = len(cats)
tree = [[] for i in range(ncats+1)]
for cat in cats:
catnum = int(os.path.basename(cat))
for item in os.listdir(cat):
if not item.endswith('.sift') and os.path.splitext(item)[-1].lower() in ALLOWED_IMGFORMATS:
tree[catnum].append(cat + '\\' + item)
fileDict = {cat : tree[cat] for cat in range(1,ncats+1)}
return fileDict
----EDIT 2---- Another development. As stated above, this problem exists when the code is in scripts launched from the shell. But not any shell. The problem exists with Console 2, but not the Windows command prompt. It also exists when the script is launched from java (how I originally came across the problem) like so: http://www.programmersheaven.com/mb/python/415726/415726/invoking-python-script-from-java/?S=B20000
I've never really trusted os.walk(). Just write your own recursive stuff. It's not hard:
def contents(folder, l): # Recursive, returns list of all files with full paths
directContents = os.listdir(folder)
for item in directContents:
if os.path.isfile(os.path.join(folder, item)):
l.append(os.path.join(folder, item))
else:contents(os.path.join(folder, item), l)
return l
contents = contents(folder, [])
contents
will then be a list of all the files with full paths included. You can use os.split() if you like to make it a little easier to read.
Knowing how this works eliminates the uncertainty of using os.walk() in your code, which means you'll be able to identify if the problem in your code is really involved with os.walk().
If you need to put them in a dictionary (because dictionaries have aliasing benefits, too), you can also sort your files that way.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With