I am new to python, I wrote an algorithm to read 10 txt files in a folder and then write the first line of each of them in one txt outfile. but it doesn't work. I mean after I run it, I will neither face any error nor get the outfile. <pre class="prettyprint"><code>def MergePerFolder(path): path1=listdir_fullpath(path) for i in path1: infile=open(i) outfile=open('F:// merge1.txt', 'w') a=infile.readline().split('.') for k in range (len(a)): print(a[0], file=outfile, end='') infile.close() outfile.close print("done") </code></pre>

Thanks to Eddo Hintoso for his detailed answer, I've slightly tweaked it to use <code>yield</code> rather than <code>return</code> so it doesn't need to be mapped. I'm posting it here in case it is useful to anyone else who finds this post. <pre class="prettyprint"><code>import glob files = glob.glob("data/*.txt") def map_first_lines(file_list): for file in file_list: with open(file, 'r') as fd: yield fd.readline() [print(f) for f in map_first_lines(files)] </code></pre> So another way to solve this particular problem: <pre class="prettyprint"><code>import glob def map_first_lines(file_list): for file in file_list: with open(file, 'rt') as fd: yield fd.readline() def merge_first_lines(file_list, filename='first_lines.txt'): with open(filename, 'w') as f: for line in map_first_lines(file_list): f.write("%s\n" % line) files = glob.glob("data/*.txt") merge_first_lines(files) </code></pre>

how to read a list of txt files in a folder in python

Tags:

python

I am new to python, I wrote an algorithm to read 10 txt files in a folder and then write the first line of each of them in one txt outfile. but it doesn't work. I mean after I run it, I will neither face any error nor get the outfile.

def MergePerFolder(path):
    path1=listdir_fullpath(path)
    for i in path1:
        infile=open(i)
        outfile=open('F:// merge1.txt', 'w')
        a=infile.readline().split('.')
        for k in range (len(a)):
            print(a[0], file=outfile, end='')

    infile.close()
    outfile.close
    print("done")

901

asked Feb 27 '16 17:02

hhs

3 Answers

NOTE: I do write the function(s) at the end of my answer, so feel free to jump to that - but I still wanted to run through the code part by part for the sake of better understanding.

Example scenario that will be used for explanation

Say you have 12 files in this folder called test, 10 of which are .txt files:

.../
    test/
        01.txt
        02.txt
        03.txt
        04.txt
        05.txt
        06.txt
        07.txt
        08.txt
        09.txt
        10.txt
        random_file.py
        this_shouldnt_be_here.sh

With each .txt file having their first line as their corresponding number, like

01.txt contains the first line 01,
02.txt contains the first line 02,
etc...

List all text files in the designated directory

You can do this in two ways:

Method 1: `os` module

You can import the module os and use the method listdir to list all the files in that directory. It is important to note that all files in the list will be relative filenames:

>>> import os             
>>> all_files = os.listdir("test/")   # imagine you're one directory above test dir
>>> print(all_files)  # won't necessarily be sorted
['08.txt', '02.txt', '09.txt', '04.txt', '05.txt', '06.txt', '07.txt', '03.txt', '06.txt', '01.txt', 'this_shouldnt_be_here.sh', '10.txt', 'random_file.py']

Now, you only want the .txt files, so with a bit of functional programming using the filter function and anonymous functions, you can easily filter them out without using standard for loops:

>>> txt_files = filter(lambda x: x[-4:] == '.txt', all_files)
>>> print(txt_files)  # only text files
['08.txt', '02.txt', '09.txt', '04.txt', '05.txt', '06.txt', '07.txt', '03.txt', '06.txt', '01.txt', '10.txt']

Method 2: `glob` module

Similarly, you can use the glob module and use the glob.glob function to list all text files in the directory without using any functional programming above! The only difference is that glob will output a list with prefix paths, however you inputted it.

>>> import glob
>>> txt_files = glob.glob("test/*.txt")
['test/08.txt', 'test/02.txt', 'test/09.txt', 'test/04.txt', 'test/05.txt', 'test/06.txt', 'test/07.txt', 'test/03.txt', 'test/06.txt', 'test/01.txt', 'test/10.txt']

What I mean by glob outputting the list by however you input the relative or full path - for example, if you were in the test directory and you called glob.glob('./*.txt'), you would get a list like:

>>> glob.glob('./*.txt')
['./08.txt', './02.txt', './09.txt', ... ]

By the way, ./ means in the same directory. Alternatively, you can just not prepend the ./ - but the string representations will accordingly change:

>>> glob.glob("*.txt")  # already in directory containing the text files
['08.txt', '02.txt', '09.txt', ... ]

Doing something with a file using file context managers

Alright, now the problem with your code is that you are opening these connections to all these files without closing them. Generally, the procedure to do something with a file in python is this:

fd = open(filename, mode)
fd.method  # could be write(), read(), readline(), etc...
fd.close()

Now, the problem with this is that if something goes wrong in the second line where you call a method on the file, the file will never close and you're in big trouble.

To prevent this, we use what we call file context manager in Python using the with keyword. This ensures the file will close with or without failures.

with open(filename, mode) as fd:
    fd.method

Reading the first line of a file with `readline()`

As you probably know already, to extract the first line of a file, you simply have to open it and call the readline() method. We want to do this with all the text files listed in txt_files, but yes - you can do this with functional programming map function, except this time we won't be writing an anonymous function (for readability):

>>> def read_first_line(file):
...     with open(file, 'rt') as fd:
...         first_line = fd.readline()
...     return first_line
...
>>> output_strings = map(read_first_line, txt_files)  # apply read first line function all text files
>>> print(output_strings)
['08\n', '02\n', '09\n', '04\n', '05\n', '06\n', '07\n', '03\n', '06\n', '01\n', '10\n']

If you want the output_list to be sorted, just sort the txt_files beforehand or just sort the output_list itself. Both works:

output_strings = map(read_first_line, sorted(txt_files))
output_strings = sorted(map(read_first_line, txt_files))

Concatenate the output strings and write them to an output file

So now you have a list of output strings, and the last thing you want to do, is combine them:

>>> output_content = "".join(sorted(output_strings))  # sort join the output strings without separators
>>> output_content  # as a string
'01\n02\n03\n04\n05\n06\n07\n08\n09\n10\n'
>>> print(output_content)  # print as formatted
01
02
03
04
05
06
07
08
09
10

Now it's just a matter of writing this giant string to an output file! Let's call it outfile.txt:

>>> with open('outfile.txt', 'wt') as fd:
...    fd.write(output_content)
...

Then you're done! You're all set! Let's confirm it:

>>> with open('outfile.txt', 'rt') as fd:
...    print fd.readlines()
...
['01\n', '02\n', '03\n', '04\n', '05\n', '06\n', '07\n', '08\n', '09\n', '10\n']

All of the above in a function

I'll be using the glob module so that it will always know what directory I will be accessing my paths from without the hassle of using absolute paths with the os module and whatnot.

import glob

def read_first_line(file):
    """Gets the first line from a file.

    Returns
    -------
    str
        the first line text of the input file
    """
    with open(file, 'rt') as fd:
        first_line = fd.readline()
    return first_line

def merge_per_folder(folder_path, output_filename):
    """Merges first lines of text files in one folder, and
    writes combined lines into new output file

    Parameters
    ----------
    folder_path : str
        String representation of the folder path containing the text files.
    output_filename : str
        Name of the output file the merged lines will be written to.
    """
    # make sure there's a slash to the folder path 
    folder_path += "" if folder_path[-1] == "/" else "/"
    # get all text files
    txt_files = glob.glob(folder_path + "*.txt")
    # get first lines; map to each text file (sorted)
    output_strings = map(read_first_line, sorted(txt_files))
    output_content = "".join(output_strings)
    # write to file
    with open(folder_path + output_filename, 'wt') as outfile:
        outfile.write(output_content)

answered Oct 19 '22 16:10

Eddo Hintoso

Lets assume you have files in the folder path path = /home/username/foldername/

so you have all the files in the path folder, to read all the files in the folder you should use os or `glob' to do that.

import os
path = "/home/username/foldername/"
savepath = "/home/username/newfolder/" 
for dir,subdir,files in os.walk(path):
    infile = open(path+files)
    outfile = open(savepath,'w')
    a = infile.readline().split('.')
    for k in range (0,len(a)):
        print(a[0], file=outfile, end='')
infile.close()
outfile.close
print "done"

or using glob you can do it much lesser lines of code.

import glob
path = "/home/username/foldername/"
savepath = "/home/username/newfolder/"
for files in glob.glob(path +"*.txt"):
    infile = open(files)
    outfile = open(savepath,'w')
    a = infile.readline().split('.')
    for k in range (0,len(a)):
        print(a[0], file=outfile, end='')
infile.close()
outfile.close
print "done"

hope it might work for you.

answered Oct 19 '22 16:10

Jaiprasad

Thanks to Eddo Hintoso for his detailed answer, I've slightly tweaked it to use yield rather than return so it doesn't need to be mapped. I'm posting it here in case it is useful to anyone else who finds this post.

import glob

files = glob.glob("data/*.txt")


def map_first_lines(file_list):
    for file in file_list:
        with open(file, 'r') as fd:
            yield fd.readline()


[print(f) for f in map_first_lines(files)]

So another way to solve this particular problem:

import glob


def map_first_lines(file_list):
    for file in file_list:
        with open(file, 'rt') as fd:
            yield fd.readline()


def merge_first_lines(file_list, filename='first_lines.txt'):
    with open(filename, 'w') as f:
        for line in map_first_lines(file_list):
            f.write("%s\n" % line)


files = glob.glob("data/*.txt")

merge_first_lines(files)

answered Oct 19 '22 16:10

OrderAndChaos

Related questions
                            
                                Can not start elasticsearch as a service in ubuntu 16.04
                            
                                Python + Browser with Mac: Error - 'chromedriver' executable needs to be in PATH
                            
                                Generator expressions Python
                            
                                Find previous calendar day in python [duplicate]
                            
                                Creating acronyms in Python
                            
                                Can I set the umask for tempfile.NamedTemporaryFile in python?
                            
                                python default parameter value using datetime
                            
                                Python read text file from second line to fifteenth [closed]
                            
                                Tail Recursion Fibonacci
                            
                                from matplotlib import ft2font: "ImportError: DLL load failed: The specified procedure could not be found."
                            
                                combine word document using python docx
                            
                                "ImportError: No module named twilio.rest"
                            
                                Applying sqrt function on a column
                            
                                Return dataframe subset based on a list of boolean values
                            
                                Count the identical pairs in two lists
                            
                                Python's 'with' statement versus 'with .. as'
                            
                                Setting a class attribute with a given name in python while defining the class
                            
                                Custom Sorting Python Dictionary
                            
                                Need help in adding binary numbers in python
                            
                                AttributeError: 'float' object has no attribute 'lower'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

how to read a list of txt files in a folder in python

Tags:

python

hhs

People also ask

3 Answers

NOTE: I do write the function(s) at the end of my answer, so feel free to jump to that - but I still wanted to run through the code part by part for the sake of better understanding.

Example scenario that will be used for explanation

List all text files in the designated directory

Method 1: `os` module

Method 2: `glob` module

Doing something with a file using file context managers

Reading the first line of a file with `readline()`

Concatenate the output strings and write them to an output file

All of the above in a function

Eddo Hintoso

Jaiprasad

OrderAndChaos

Recent Activity

Donate For Us

how to read a list of txt files in a folder in python

Tags:

python

hhs

People also ask

3 Answers

NOTE: I do write the function(s) at the end of my answer, so feel free to jump to that - but I still wanted to run through the code part by part for the sake of better understanding.

Example scenario that will be used for explanation

List all text files in the designated directory

Method 1: os module

Method 2: glob module

Doing something with a file using file context managers

Reading the first line of a file with readline()

Concatenate the output strings and write them to an output file

All of the above in a function

Eddo Hintoso

Jaiprasad

OrderAndChaos

Related questions

Recent Activity

Donate For Us

Method 1: `os` module

Method 2: `glob` module

Reading the first line of a file with `readline()`