I noticed some strange behavior today playing around with <code>next()</code> and <code>readline()</code>. It seems that both functions produce the same results (which is what I expect). However, when I mix them, I get a <code>ValueError</code>. Here's what I did: <pre class="prettyprint"><code>>>> f = open("text.txt", 'r') >>> f.readline() 'line 0\n' >>> f.readline() 'line 1\n' >>> f.readline() 'line 2\n' >>> f.next() 'line 3\n' >>> f.next() 'line 4\n' >>> f.readline() Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: Mixing iteration and read methods would lose data >>> >>> f = open("text.txt", 'r') >>> f.next() 'line 0\n' >>> f.next() 'line 1\n' >>> f.next() 'line 2\n' >>> f.readline() Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: Mixing iteration and read methods would lose data </code></pre> So the overall question here is what's going on underneath the hood that causes this error? Some questions that might get answered along with but I would like to hear an answer for if not: <ol> <li>What are the differences between <code>next()</code> and <code>readline()</code>?</li> <li>When I do <code>for f in file:</code> which function am I calling (and does it matter)?</li> <li>Why can I call <code>next()</code> after <code>readline()</code>, but not the other way around?</li> </ol> Thanks in advance, I don't think it matters, but in case this is version dependent, I'm on Python 2.7.6 for Windows

According to Python's doc (emphasis is mine) <blockquote> A file object is its own iterator, for example iter(f) returns f (unless f is closed). When a file is used as an iterator, typically in a for loop (for example, for line in f: print line.strip()), the next() method is called repeatedly. This method returns the next input line, or raises StopIteration when EOF is hit when the file is open for reading (behavior is undefined when the file is open for writing). In order to make a for loop the most efficient way of looping over the lines of a file (a very common operation), the next() method uses a hidden read-ahead buffer. As a consequence of using a read-ahead buffer, combining next() with other file methods (like readline()) does not work right. However, using seek() to reposition the file to an absolute position will flush the read-ahead buffer. </blockquote> The <code>next</code> method reads more that is needed for efficiency reasons. This breaks <code>readline</code>. So the answers are <ol> <li> <code>next</code> is faster due to read-ahead</li> <li> <code>for s in f:</code> use <code>next</code> </li> <li>before calling <code>next</code>, <code>readline</code> uses standard slow read on the file so there is no problem. </li> </ol>

Mixing file.readline() and file.next()

Tags:

python

I noticed some strange behavior today playing around with next() and readline(). It seems that both functions produce the same results (which is what I expect). However, when I mix them, I get a ValueError. Here's what I did:

>>> f = open("text.txt", 'r')
>>> f.readline()
'line 0\n'
>>> f.readline()
'line 1\n'
>>> f.readline()
'line 2\n'
>>> f.next()
'line 3\n'
>>> f.next()
'line 4\n'
>>> f.readline()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: Mixing iteration and read methods would lose data
>>>
>>> f = open("text.txt", 'r')
>>> f.next()
'line 0\n'
>>> f.next()
'line 1\n'
>>> f.next()
'line 2\n'
>>> f.readline()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: Mixing iteration and read methods would lose data

So the overall question here is what's going on underneath the hood that causes this error?

Some questions that might get answered along with but I would like to hear an answer for if not:

What are the differences between next() and readline()?
When I do for f in file: which function am I calling (and does it matter)?
Why can I call next() after readline(), but not the other way around?

Thanks in advance,

I don't think it matters, but in case this is version dependent, I'm on Python 2.7.6 for Windows

948

asked Mar 04 '14 18:03

wnnmaw

1 Answers

According to Python's doc (emphasis is mine)

A file object is its own iterator, for example iter(f) returns f (unless f is closed). When a file is used as an iterator, typically in a for loop (for example, for line in f: print line.strip()), the next() method is called repeatedly. This method returns the next input line, or raises StopIteration when EOF is hit when the file is open for reading (behavior is undefined when the file is open for writing). In order to make a for loop the most efficient way of looping over the lines of a file (a very common operation), the next() method uses a hidden read-ahead buffer. As a consequence of using a read-ahead buffer, combining next() with other file methods (like readline()) does not work right. However, using seek() to reposition the file to an absolute position will flush the read-ahead buffer.

The next method reads more that is needed for efficiency reasons. This breaks readline. So the answers are

next is faster due to read-ahead
for s in f: use next
before calling next, readline uses standard slow read on the file so there is no problem.

134

answered Oct 01 '22 00:10

hivert

Related questions
                            
                                Missing bootstrap resources in Django-Rest-Framework
                            
                                Why does PyCrypto not use the default IV?
                            
                                Django 1.4 - Redirect to Non-HTTP urls
                            
                                How to I delete all Flask sessions?
                            
                                Pass another object to the main flask application
                            
                                Loading huge XML files and dealing with MemoryError
                            
                                py.test - how to use a context manager in a funcarg/fixture
                            
                                Python regex search for string at beginning of line in file
                            
                                In-place QuickSort in Python
                            
                                How to use @pytest.mark with base classes?
                            
                                pandas: Filling missing values within a group
                            
                                Python: How do I save generator output into text file?
                            
                                vary the color of each bar in bargraph using particular value
                            
                                How to select next node using scrapy
                            
                                Equivalent of transform in R/ddply in Python/pandas?
                            
                                djangorestframework serializer errors: {u'non_field_errors': [u'No input provided']}
                            
                                Python 3 tkinter iconbitmap error in ubuntu
                            
                                Convert list of rgb codes to matplotlib colormap
                            
                                What could cause an open file dialog window in Tkinter/Python to be really slow to close after the user selects a file?
                            
                                Always execute Code and the end of a python script

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With