Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the most pythonic way to iterate over all the lines of multiple files?

I want to treat many files as if they were all one file. What's the proper pythonic way to take [filenames] => [file objects] => [lines] with generators/not reading an entire file into memory?

We all know the proper way to open a file:

with open("auth.log", "rb") as f:
    print sum(f.readlines())

And we know the correct way to link several iterators/generators into one long one:

>>> list(itertools.chain(range(3), range(3)))
[0, 1, 2, 0, 1, 2]

but how do I link multiple files together and preserve the context managers?

with open("auth.log", "rb") as f0:
    with open("auth.log.1", "rb") as f1:
        for line in itertools.chain(f0, f1):
            do_stuff_with(line)

    # f1 is now closed
# f0 is now closed
# gross

I could ignore the context managers and do something like this, but it doesn't feel right:

files = itertools.chain(*(open(f, "rb") for f in file_names))
for line in files:
    do_stuff_with(line)

Or is this kind of what Async IO - PEP 3156 is for and I'll just have to wait for the elegant syntax later?

like image 612
Conrad.Dean Avatar asked Apr 19 '13 01:04

Conrad.Dean


People also ask

How do you iterate over a line in Python?

To read a file word by word in Python, you can loop over each line in a file and then get the words in each line by using the Python string split() function.

How do I iterate a text file in Python?

Just iterate over each line in the file. Python automatically checks for the End of file and closes the file for you (using the with syntax). Show activity on this post. This will work because the the readline() leaves a trailing newline character, where as EOF is just an empty string.


1 Answers

There's always fileinput.

for line in fileinput.input(filenames):
    ...

Reading the source however, it appears that fileinput.FileInput can't be used as a context manager1. To fix that, you could use contextlib.closing since FileInput instances have a sanely implemented close method:

from contextlib import closing
with closing(fileinput.input(filenames)) as line_iter:
    for line in line_iter:
        ...

An alternative with the context manager, is to write a simple function looping over the files and yielding lines as you go:

def fileinput(files):
    for f in files:
        with open(f,'r') as fin:
            for line in fin:
                yield line

No real need for itertools.chain here IMHO ... The magic here is in the yield statement which is used to transform an ordinary function into a fantastically lazy generator.


1As an aside, starting with python3.2, fileinput.FileInput is implemented as a context manager which does exactly what we did before with contextlib. Now our example becomes:

# Python 3.2+ version
with fileinput.input(filenames) as line_iter:
    for line in line_iter:
        ...

although the other example will work on python3.2+ as well.

like image 109
mgilson Avatar answered Oct 13 '22 05:10

mgilson