I need to read a big file by reading at most N lines at a time, until EOF. What is the most effective way of doing it in Python? Something like: <pre class="prettyprint"><code>with open(filename, 'r') as infile: while not EOF: lines = [get next N lines] process(lines) </code></pre>

One solution would be a list comprehension and the slice operator: <pre class="prettyprint"><code>with open(filename, 'r') as infile: lines = [line for line in infile][:N] </code></pre> After this <code>lines</code> is tuple of lines. However, this would load the complete file into memory. If you don't want this (i.e. if the file could be really large) there is another solution using a generator expression and <code>islice</code> from the itertools package: <pre class="prettyprint"><code>from itertools import islice with open(filename, 'r') as infile: lines_gen = islice(infile, N) </code></pre> <code>lines_gen</code> is a generator object, that gives you each line of the file and can be used in a loop like this: <pre class="prettyprint"><code>for line in lines_gen: print line </code></pre> Both solutions give you up to N lines (or fewer, if the file doesn't have that much).

This code will work with any count of lines in file and any <code>N</code>. If you have <code>1100 lines</code> in file and <code>N = 200</code>, you will get 5 times to process chunks of 200 lines and one time with 100 lines. <pre class="prettyprint"><code>with open(filename, 'r') as infile: lines = [] for line in infile: lines.append(line) if len(lines) >= N: process(lines) lines = [] if len(lines) > 0: process(lines) </code></pre>

How to read file N lines at a time?

Tags:

python

iterator

file-io

I need to read a big file by reading at most N lines at a time, until EOF. What is the most effective way of doing it in Python? Something like:

with open(filename, 'r') as infile:
    while not EOF:
        lines = [get next N lines]
        process(lines)

936

asked Apr 29 '11 13:04

madprogrammer

Video Answer

3 Answers

One solution would be a list comprehension and the slice operator:

with open(filename, 'r') as infile:
    lines = [line for line in infile][:N]

After this lines is tuple of lines. However, this would load the complete file into memory. If you don't want this (i.e. if the file could be really large) there is another solution using a generator expression and islice from the itertools package:

from itertools import islice
with open(filename, 'r') as infile:
    lines_gen = islice(infile, N)

lines_gen is a generator object, that gives you each line of the file and can be used in a loop like this:

for line in lines_gen:
    print line

Both solutions give you up to N lines (or fewer, if the file doesn't have that much).

answered Oct 12 '22 17:10

Martin Thurau

A file object is an iterator over lines in Python. To iterate over the file N lines at a time, you could use grouper() function in the Itertools Recipes section of the documenation. (Also see What is the most “pythonic” way to iterate over a list in chunks?):

try:
   from itertools import izip_longest
except ImportError:  # Python 3
    from itertools import zip_longest as izip_longest

def grouper(iterable, n, fillvalue=None):
    args = [iter(iterable)] * n
    return izip_longest(*args, fillvalue=fillvalue)

Example

with open(filename) as f:
     for lines in grouper(f, N, ''):
         assert len(lines) == N
         # process N lines here

answered Oct 12 '22 16:10

jfs

This code will work with any count of lines in file and any N. If you have 1100 lines in file and N = 200, you will get 5 times to process chunks of 200 lines and one time with 100 lines.

with open(filename, 'r') as infile:
    lines = []
    for line in infile:
        lines.append(line)
        if len(lines) >= N:
            process(lines)
            lines = []
    if len(lines) > 0:
        process(lines)

answered Oct 12 '22 16:10

Anatolij

Related questions
                            
                                Python: Unpacking an inner nested tuple/list while still getting its index number
                            
                                How to get an UTC date string in Python? [duplicate]
                            
                                How to specify upper and lower limits when using numpy.random.normal
                            
                                Python built-in function "compile". What is it used for?
                            
                                How can I overlay two graphs in Seaborn?
                            
                                Why does my Python code print the extra characters "ï»¿" when reading from a text file?
                            
                                PySpark row-wise function composition
                            
                                gcc error trying to install PIL in a Python2.6 virtualenv
                            
                                Django Query using .order_by() and .latest()
                            
                                sampling random floats on a range in numpy
                            
                                How to check if all values in the columns of a numpy matrix are the same?
                            
                                How to split a string using an empty separator in Python
                            
                                How to get a normal distribution within a range in numpy? [duplicate]
                            
                                Cannot "pip install cryptography" in Docker Alpine Linux 3.3 with OpenSSL 1.0.2g and Python 2.7
                            
                                sqlalchemy existing database query
                            
                                How to write native newline character to a file descriptor in Python?
                            
                                How do I change the file creation date of a Windows file?
                            
                                I don't understand this python __del__ behaviour
                            
                                PyMySQL can't connect to MySQL on localhost
                            
                                Access Multiselect Form Field in Flask

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With