Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python 3: reading bytes from stdin pipe with readahead

i want to read bytes. sys.stdin is opened in textmode, yet it has a buffer that can be used to read bytes: sys.stdin.buffer.

my problem is that when i pipe data into python i only seem to have 2 options if i want readahead, else i get a io.UnsupportedOperation: File or stream is not seekable.

  1. reading buffered text from sys.stdin, decoding that text to bytes, and seeking back

    (sys.stdin.read(1).decode(); sys.stdin.seek(-1, io.SEEK_CUR).

    unacceptable due to non-encodable bytes in the input stream.

  2. using peek to get some bytes from the stdin’s buffer, slicing that to the appropriate number, and praying, as peek doesn’t guarantee anything: it may give less or more than you request…

    (sys.stdin.buffer.peek(1)[:1])

    peek is really underdocumented and gives you a bunch of bytes that you have to performance-intensively slice.

btw. that error really only applies when piping: for ./myscript.py <somefile, sys.stdin.buffer supports seeking. yet the sys.stdin is always the same hierarchy of objects:

$ cat testio.py
#!/usr/bin/env python3
from sys import stdin
print(stdin)
print(stdin.buffer)
print(stdin.buffer.raw)"
$ ./testio.py
<_io.TextIOWrapper name='<stdin>' mode='r' encoding='UTF-8'>
<_io.BufferedReader name='<stdin>'>
<_io.FileIO name='<stdin>' mode='rb'>
$ ./testio.py <somefile
[the same as above]
$ echo hi | ./testio.py
[the same as above]

some initial ideas like wrapping the byte stream into a random access buffer fail with the same error as mentioned above: BufferedRandom(sys.stdin.buffer).seek(0)io.UnsupportedOperation…

finally, for your convenience i present:

Python’s io class hierarchy

IOBase
├RawIOBase
│└FileIO
├BufferedIOBase  (buffers a RawIOBase)
│├BufferedWriter┐ 
│├BufferedReader│
││        └─────┴BufferedRWPair
│├BufferedRandom (implements seeking)
│└BytesIO        (wraps a bytes)
└TextIOBase
 ├TextIOWrapper  (wraps a BufferedIOBase)
 └TextIO         (wraps a str)

and in case you forgot the question: how do i get the next byte from stdin without de/encoding anything, and without advancing the stream’s cursor?

like image 586
flying sheep Avatar asked Jan 11 '13 17:01

flying sheep


1 Answers

The exception doesn't come from Python, but from the operating system, which doesn't allow seeking on pipes. (If you redirect output from a regular pipe, it can be seeked, even though it's standard input.) This is why you get the error in one case and not in the other, even though the classes are the same.

The classic Python 2 solution for readahead would be to wrap the stream in your own stream implementation that implements readahead:

class Peeker(object):
    def __init__(self, fileobj):
        self.fileobj = fileobj
        self.buf = cStringIO.StringIO()

    def _append_to_buf(self, contents):
        oldpos = self.buf.tell()
        self.buf.seek(0, os.SEEK_END)
        self.buf.write(contents)
        self.buf.seek(oldpos)

    def peek(self, size):
        contents = self.fileobj.read(size)
        self._append_to_buf(contents)
        return contents

    def read(self, size=None):
        if size is None:
            return self.buf.read() + self.fileobj.read()
        contents = self.buf.read(size)
        if len(contents) < size:
            contents += self.fileobj.read(size - len(contents))
        return contents

    def readline(self):
        line = self.buf.readline()
        if not line.endswith('\n'):
            line += self.fileobj.readline()
        return line

sys.stdin = Peeker(sys.stdin)

In Python 3 supporting the full sys.stdin while peeking the undecoded stream is complicated—one would wrap stdin.buffer as shown above, then instantiate a new TextIOWrapper over your peekable stream, and install that TextIOWrapper as sys.stdin.

However, since you only need to peek at sys.stdin.buffer, the above code will work just fine, after changing cStringIO.StringIO to io.BytesIO and '\n' to b'\n'.

like image 195
user4815162342 Avatar answered Sep 27 '22 22:09

user4815162342