I am trying to get a list of strings with the file path and the file name. At the moment I only get the file names into the list.
Code:
hamFileNames = os.listdir("train_data\ham")
Output:
['0002.1999-12-13.farmer.ham.txt', 
 '0003.1999-12-14.farmer.ham.txt', 
 '0005.1999-12-14.farmer.ham.txt']
I would want an output similar to this:
['train_data\ham\0002.1999-12-13.farmer.ham.txt',
 'train_data\ham\0003.1999-12-14.farmer.ham.txt',
 'train_data\ham\0005.1999-12-14.farmer.ham.txt']
                Since you have access to the directory path you could just do:
dir = "train_data\ham"
output = map(lambda p: os.path.join(dir, p), os.listdir(dir))
or simpler
output = [os.path.join(dir, p) for p in os.listdir(dir)]
Where os.path.join will join your directory path with the filenames inside it.
If you're on Python 3.5 or higher, skip os.listdir in favor of os.scandir, which is both more efficient and does the work for you (path is an attribute of the result objects):
hamFileNames = [entry.path for entry in os.scandir(r"train_data\ham")]
This also lets you cheaply filter (scandir includes some file info for free, without stat-ing the file), e.g. to keep only files (no directories or special file-system objects):
hamFileNames = [entry.path for entry in os.scandir(r"train_data\ham") if entry.is_file()]
If you're on 3.4 or below, you may want to look at the PyPI scandir module (which provides the same API on earlier Python).
Also note: I used a raw string for the path; while \h happens to work without it, you should always use raw strings for Windows path literals, or you'll get a nasty shock when you try to use "train_data\foo" (where \f is the ASCII form feed character), while r"train_data\foo" works just fine (because the r prefix prevents backslash interpolation of anything but the quote character).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With