import sys
import glob
import os.path
list_of_files = glob.glob('/Users/Emily/Topics/*.txt') #500 files
for file_name in list_of_files:
print(file_name)
f= open(file_name, 'r')
lst = []
for line in f:
line.strip()
line = line.replace("\n" ,'')
line = line.replace("//" , '')
lst.append(line)
f.close()
f=open(os.path.join('/Users/Emily/UpdatedTopics',
os.path.basename(file_name)) , 'w')
for line in lst:
f.write(line)
f.close()
I was able to read my files and do some pre-processing. The problem I'm facing is that when I write the files out, I can only see one file. I should get 500 files.
Import the OS module in your notebook. Define a path where the text files are located in your system. Create a list of files and iterate over to find if they all are having the correct extension or not. Read the files using the defined function in the module.
Use the glob function in the python library glob to find all the files you want to analyze. You can have multiple for loops nested inside each other. Python can only print strings to files.
Python provides the ability to open as well as work with multiple files at the same time. Different files can be opened in different modes, to simulate simultaneous writing or reading from these files. An arbitrary number of files can be opened with the open() method supported in Python 2.7 version or greater.
As currently written, the only file that gets processed is the last file in the list of file names. You need to indent so that each file gets processed in your loop.
import sys
import glob
import os.path
list_of_files = glob.glob('/Users/Emily/Topics/*.txt') #500 files
for file_name in list_of_files:
print(file_name)
# This needs to be done *inside the loop*
f= open(file_name, 'r')
lst = []
for line in f:
line.strip()
line = line.replace("\n" ,'')
line = line.replace("//" , '')
lst.append(line)
f.close()
f=open(os.path.join('/Users/Emily/UpdatedTopics',
os.path.basename(file_name)) , 'w')
for line in lst:
f.write(line)
f.close()
Python uses indentation instead of curly braces to help group code. Right now the way your code is indented, Python is interpreting it like this:
# get list of files
list_of_files = glob.glob('/Users/Emily/Topics/*.txt') #500 files
# loop through all file names
for file_name in list_of_files:
# print the name of file
print(file_name)
# PROBLEM: you remove your indentation so we are no longer in
# our for loop. Now we take the last value of file_name (or the
# last file in the list) and open it and then continue the script
f= open(file_name, 'r')
...
Notice that we leave the for loop because of the change in indentation. The rest of your script runs only on the last file opened in the for loop.
Try this
import os
path = "/Users/Emily/Topics/"
for root,dirs,files in os.walk(path):
for dir in dirs:
write_files = [os.path.join(dir) + ".txt"]
for wf in write_files:
with open(wf,"w") as outfile:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With