i want to read bytes. sys.stdin
is opened in textmode, yet it has a buffer that can be used to read bytes: sys.stdin.buffer
.
my problem is that when i pipe data into python i only seem to have 2 options if i want readahead, else i get a io.UnsupportedOperation: File or stream is not seekable.
reading buffered text from sys.stdin
, decoding that text to bytes, and seeking back
(sys.stdin.read(1).decode(); sys.stdin.seek(-1, io.SEEK_CUR)
.
unacceptable due to non-encodable bytes in the input stream.
using peek
to get some bytes from the stdin’s buffer, slicing that to the appropriate number, and praying, as peek
doesn’t guarantee anything: it may give less or more than you request…
(sys.stdin.buffer.peek(1)[:1]
)
peek is really underdocumented and gives you a bunch of bytes that you have to performance-intensively slice.
btw. that error really only applies when piping: for ./myscript.py <somefile
, sys.stdin.buffer
supports seeking. yet the sys.stdin
is always the same hierarchy of objects:
$ cat testio.py
#!/usr/bin/env python3
from sys import stdin
print(stdin)
print(stdin.buffer)
print(stdin.buffer.raw)"
$ ./testio.py
<_io.TextIOWrapper name='<stdin>' mode='r' encoding='UTF-8'>
<_io.BufferedReader name='<stdin>'>
<_io.FileIO name='<stdin>' mode='rb'>
$ ./testio.py <somefile
[the same as above]
$ echo hi | ./testio.py
[the same as above]
some initial ideas like wrapping the byte stream into a random access buffer fail with the same error as mentioned above: BufferedRandom(sys.stdin.buffer).seek(0)
⇒ io.UnsupportedOperation…
finally, for your convenience i present:
IOBase
├RawIOBase
│└FileIO
├BufferedIOBase (buffers a RawIOBase)
│├BufferedWriter┐
│├BufferedReader│
││ └─────┴BufferedRWPair
│├BufferedRandom (implements seeking)
│└BytesIO (wraps a bytes)
└TextIOBase
├TextIOWrapper (wraps a BufferedIOBase)
└TextIO (wraps a str)
and in case you forgot the question: how do i get the next byte from stdin without de/encoding anything, and without advancing the stream’s cursor?
The exception doesn't come from Python, but from the operating system, which doesn't allow seeking on pipes. (If you redirect output from a regular pipe, it can be seeked, even though it's standard input.) This is why you get the error in one case and not in the other, even though the classes are the same.
The classic Python 2 solution for readahead would be to wrap the stream in your own stream implementation that implements readahead:
class Peeker(object):
def __init__(self, fileobj):
self.fileobj = fileobj
self.buf = cStringIO.StringIO()
def _append_to_buf(self, contents):
oldpos = self.buf.tell()
self.buf.seek(0, os.SEEK_END)
self.buf.write(contents)
self.buf.seek(oldpos)
def peek(self, size):
contents = self.fileobj.read(size)
self._append_to_buf(contents)
return contents
def read(self, size=None):
if size is None:
return self.buf.read() + self.fileobj.read()
contents = self.buf.read(size)
if len(contents) < size:
contents += self.fileobj.read(size - len(contents))
return contents
def readline(self):
line = self.buf.readline()
if not line.endswith('\n'):
line += self.fileobj.readline()
return line
sys.stdin = Peeker(sys.stdin)
In Python 3 supporting the full sys.stdin
while peeking the undecoded stream is complicated—one would wrap stdin.buffer
as shown above, then instantiate a new TextIOWrapper
over your peekable stream, and install that TextIOWrapper
as sys.stdin
.
However, since you only need to peek at sys.stdin.buffer
, the above code will work just fine, after changing cStringIO.StringIO
to io.BytesIO
and '\n'
to b'\n'
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With