Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python recursive folder read

I have a C++/Obj-C background and I am just discovering Python (been writing it for about an hour). I am writing a script to recursively read the contents of text files in a folder structure.

The problem I have is the code I have written will only work for one folder deep. I can see why in the code (see #hardcoded path), I just don't know how I can move forward with Python since my experience with it is only brand new.

Python Code:

import os import sys  rootdir = sys.argv[1]  for root, subFolders, files in os.walk(rootdir):      for folder in subFolders:         outfileName = rootdir + "/" + folder + "/py-outfile.txt" # hardcoded path         folderOut = open( outfileName, 'w' )         print "outfileName is " + outfileName          for file in files:             filePath = rootdir + '/' + file             f = open( filePath, 'r' )             toWrite = f.read()             print "Writing '" + toWrite + "' to" + filePath             folderOut.write( toWrite )             f.close()          folderOut.close() 
like image 583
Brock Woolf Avatar asked Feb 06 '10 09:02

Brock Woolf


People also ask

How do I open a recursive file in Python?

Use os. walk . It recursively walks into directory and subdirectories, and already gives you separate variables for files and directories.

How do I walk a directory in Python?

To traverse the directory in Python, use the os. walk() function. The os. walk() function accepts four arguments and returns 3-tuple, including dirpath, dirnames, and filenames.


1 Answers

Make sure you understand the three return values of os.walk:

for root, subdirs, files in os.walk(rootdir): 

has the following meaning:

  • root: Current path which is "walked through"
  • subdirs: Files in root of type directory
  • files: Files in root (not in subdirs) of type other than directory

And please use os.path.join instead of concatenating with a slash! Your problem is filePath = rootdir + '/' + file - you must concatenate the currently "walked" folder instead of the topmost folder. So that must be filePath = os.path.join(root, file). BTW "file" is a builtin, so you don't normally use it as variable name.

Another problem are your loops, which should be like this, for example:

import os import sys  walk_dir = sys.argv[1]  print('walk_dir = ' + walk_dir)  # If your current working directory may change during script execution, it's recommended to # immediately convert program arguments to an absolute path. Then the variable root below will # be an absolute path as well. Example: # walk_dir = os.path.abspath(walk_dir) print('walk_dir (absolute) = ' + os.path.abspath(walk_dir))  for root, subdirs, files in os.walk(walk_dir):     print('--\nroot = ' + root)     list_file_path = os.path.join(root, 'my-directory-list.txt')     print('list_file_path = ' + list_file_path)      with open(list_file_path, 'wb') as list_file:         for subdir in subdirs:             print('\t- subdirectory ' + subdir)          for filename in files:             file_path = os.path.join(root, filename)              print('\t- file %s (full path: %s)' % (filename, file_path))              with open(file_path, 'rb') as f:                 f_content = f.read()                 list_file.write(('The file %s contains:\n' % filename).encode('utf-8'))                 list_file.write(f_content)                 list_file.write(b'\n') 

If you didn't know, the with statement for files is a shorthand:

with open('filename', 'rb') as f:     dosomething()  # is effectively the same as  f = open('filename', 'rb') try:     dosomething() finally:     f.close() 
like image 170
AndiDog Avatar answered Oct 17 '22 03:10

AndiDog