Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

io.BufferedReader peek function returning all the text in the buffer

Tags:

python

I am using Python 3.4.1 on Windows 8.

I would like to read a file with a buffered interface that allows me to peek a certain number of bytes ahead as well reading bytes. io.BufferedReader seems like the right choice.

Unfortunately, io.BufferReader.peek seems useless because it appears to just return all the bytes stored in the buffer, rather than the number requested. In fact, this is allowed by the documentation of this function (emphasis mine):

peek([size]) Return bytes from the stream without advancing the position. At most one single read on the raw stream is done to satisfy the call. The number of bytes returned may be less or more than requested.

To demonstrate what I consider useless behaviour, I have the following test file called Test1.txt:

first line
second line
third line

I create the io.BufferedReader object like this in IDLE:

>>> stream = io.BufferedReader(io.FileIO('Test1.txt'))

and then ask for two bytes,

>>> stream.peek(2)
b'first line\r\nsecond line\r\nthird line'

Eh? That's just all the text in the default buffer size (which is 8192 bytes on my system). If I change this default, I can confirm that peek() is just returning the contents of the buffer,

>>> stream2 = io.BufferedReader(io.FileIO('Test1.txt'), buffer_size=2)
>>> stream2.peek(17)
b'fi'
>>> stream2.peek(17)
b'fi'
>>> stream2.read(2)
b'fi'
>>> stream2.peek(17)
b'rs'

To be clear, the following is the output I expect to see:

>>> stream = io.BufferedReader(io.FileIO('Test1.txt'))
>>> stream.peek(2)
b'fi'
>>> stream.read(1)
b'f'
>>> stream.peek(2)
b'ir'

That is, a typical buffered stream.

What am I doing wrong in constructing this BufferedReader? How can I observe the behaviour I expect to see in Python 3.4.1?

like image 220
Charles Avatar asked Jun 29 '14 09:06

Charles


1 Answers

.peek() is indeed implemented as returning the current buffer; if you combined it with .read() calls you'd find that less and less of the buffer is returned until the buffer is filled up again.

For most purposes of .peek() this is more than fine. The number of bytes lets you limit how much data is expected from the underlying I/O source if the buffer is empty, which in turn is important if that source blocks on reads.

Simply slice the returned value:

stream.peek(num)[:num]
like image 56
Martijn Pieters Avatar answered Sep 19 '22 21:09

Martijn Pieters