import os
Current_Directory = os.getcwd() # Should be ...\archive
CORPUS_PATHS = sorted([os.path.join("archive", directories) for directories in os.listdir(Current_Directory)])
filenames = []
for items in CORPUS_PATHS:
filenames.append(sorted([os.path.join(CORPUS_PATHS, fn) for fn in os.listdir(items)]))
print filenames
I am running this code from a file called archive and in archive there are more folders and in each of these folders, there are one or more text files. I want to make a list that includes the path to each of these folders. However the following error appears.
[Error 3] The system cannot find the path specified:
I currently have the python script where I wrote this code in the same folder as archive and it will trigger this error. What should I do in order to stop this error and get all the file paths.
I am pretty bad at using os and I don't use it that often so I apologize if this is a trivial question.
import os
startpath = "archive"
corpus_path = sorted([os.path.join("archive/", directories) for directories in os.listdir(startpath)])
filenames = []
for items in corpus_path:
print items
path = [os.path.join(corpus_path, fn) for fn in os.listdir(items)]
print path
So I have made some progress and now I corpus path is essentially a list with the path to all of the desired folders. Now all I am trying to do is get all of the paths to the text files inside these folders but I still run into issues and I don't know how but error such as
File "C:\Users\David\Anaconda\lib\ntpath.py", line 65, in join
result_drive, result_path = splitdrive(path)
File "C:\Users\David\Anaconda\lib\ntpath.py", line 116, in splitdrive
normp = p.replace(altsep, sep)
AttributeError: 'list' object has no attribute 'replace'
You must be on windows machine. Error is because of os.listdir(). os.listdir() is not getting correct path.
And in line number 3, you are doing os.path.join("archive", directories). You should join complete path including drive (C: or D:) like "c:/archive/foo: or on linux "home/root/archive/foo"
Read - Python os.path.join on Windows
os.path.join Usage -
On Windows, the drive letter is not reset when an absolute path component (e.g., r'\foo') is encountered. If a component contains a drive letter, all previous components are thrown away and the drive letter is reset. Note that since there is a current directory for each drive, os.path.join("c:", "foo") represents a path relative to the current directory on drive C: (c:foo), not c:\foo.
Edit:
You are passing list corpus_path
to [os.path.join][2]
in line 6. That causes error! Replace corpus_path
with items
.
I created archive folder in my 'D:' Drive. Under archive folder I created 3 folders foo1, foo2 and foo3. Each folder contains 1 or 2 text file. Then I tested your code after modification. Code work fine. Here is the code:
import os
startpath = "d:archive"
corpus_path = sorted([os.path.join("d:", "archive", directories) for directories in os.listdir(startpath)])
filenames = []
for items in corpus_path:
print items
path = [os.path.join(items, fn) for fn in os.listdir(items)]
print path
output:
d:archive\foo1
['d:archive\\foo1\\foo1.txt.txt', 'd:archive\\foo1\\foo11.txt']
d:archive\foo2
['d:archive\\foo2\\foo2.txt.txt']
d:archive\foo3
['d:archive\\foo3\\foo3.txt.txt']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With