Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: reading and writing multiple files

import sys
import glob
import os.path

list_of_files = glob.glob('/Users/Emily/Topics/*.txt') #500 files

for file_name in list_of_files:
    print(file_name)

f= open(file_name, 'r')
lst = []
for line in f:
   line.strip()
   line = line.replace("\n" ,'')
   line = line.replace("//" , '')
   lst.append(line)
f.close()

f=open(os.path.join('/Users/Emily/UpdatedTopics',
os.path.basename(file_name)) , 'w')

for line in lst:
   f.write(line)
f.close()

I was able to read my files and do some pre-processing. The problem I'm facing is that when I write the files out, I can only see one file. I should get 500 files.

like image 935
EmilyG Avatar asked Dec 05 '16 01:12

EmilyG


People also ask

How do I read multiple files at a time in Python?

Import the OS module in your notebook. Define a path where the text files are located in your system. Create a list of files and iterate over to find if they all are having the correct extension or not. Read the files using the defined function in the module.

Can we read multiple files in Python?

Use the glob function in the python library glob to find all the files you want to analyze. You can have multiple for loops nested inside each other. Python can only print strings to files.

Can you write to two files at once in Python?

Python provides the ability to open as well as work with multiple files at the same time. Different files can be opened in different modes, to simulate simultaneous writing or reading from these files. An arbitrary number of files can be opened with the open() method supported in Python 2.7 version or greater.


3 Answers

As currently written, the only file that gets processed is the last file in the list of file names. You need to indent so that each file gets processed in your loop.

import sys
import glob
import os.path

list_of_files = glob.glob('/Users/Emily/Topics/*.txt') #500 files

for file_name in list_of_files:
    print(file_name)

    # This needs to be done *inside the loop*
    f= open(file_name, 'r')
    lst = []
    for line in f:
       line.strip()
       line = line.replace("\n" ,'')
       line = line.replace("//" , '')
       lst.append(line)
    f.close()

    f=open(os.path.join('/Users/Emily/UpdatedTopics',
    os.path.basename(file_name)) , 'w')

    for line in lst:
       f.write(line)
    f.close()
like image 193
davidlowryduda Avatar answered Oct 07 '22 15:10

davidlowryduda


Python uses indentation instead of curly braces to help group code. Right now the way your code is indented, Python is interpreting it like this:

# get list of files
list_of_files = glob.glob('/Users/Emily/Topics/*.txt') #500 files

# loop through all file names
for file_name in list_of_files:
    # print the name of file
    print(file_name)

# PROBLEM: you remove your indentation so we are no longer in
# our for loop.  Now we take the last value of file_name (or the
# last file in the list) and open it and then continue the script
f= open(file_name, 'r')
...

Notice that we leave the for loop because of the change in indentation. The rest of your script runs only on the last file opened in the for loop.

like image 37
ngoue Avatar answered Oct 07 '22 16:10

ngoue


Try this

import os
path = "/Users/Emily/Topics/"
for root,dirs,files in os.walk(path):
   for dir in dirs:
       write_files = [os.path.join(dir) + ".txt"]
       for wf in write_files:
           with open(wf,"w") as outfile:
like image 35
guozhao Avatar answered Oct 07 '22 16:10

guozhao