Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to (can I) ask a PIPE how many bytes it has available for reading?

I've implemented a non-blocking reader in Python, and I need to make it more efficient.

The background: I have massive amounts of output that I need to read from one subprocess (started with Popen()) and pass to another thread. Reading the output from that subprocess must not block for more than a few ms (preferably for as little time as is necessary to read available bytes).

Currently, I have a utility class which takes a file descriptor (stdout) and a timeout. I select() and readline(1) until one of three things happens:

  1. I read a newline
  2. my timeout (a few ms) expires
  3. select tells me there's nothing to read on that file descriptor.

Then I return the buffered text to the calling method, which does stuff with it.

Now, for the real question: because I'm reading so much output, I need to make this more efficient. I'd like to do that by asking the file descriptor how many bytes are pending and then readline([that many bytes]). It's supposed to just pass stuff through, so I don't actually care where the newlines are, or even if there are any. Can I ask the file descriptor how many bytes it has available for reading, and if so, how?

I've done some searching, but I'm having a really hard time figuring out what to search for, let alone if it's possible.

Even just a point in the right direction would be helpful.

Note: I'm developing on Linux, but that shouldn't matter for a "Pythonic" solution.

like image 475
Matt Avatar asked Nov 19 '13 17:11

Matt


1 Answers

On Linux, os.pipe() is just a wrapper around pipe(2). Both return a pair of file descriptors. Normally one would use lseek(2) (os.lseek() in Python) to reposition the offset of a file decsriptor as a way to get the amount of available data. However, not all file descriptors capable of seeking.

On Linux trying lseek(2) on a pipe will return an error, see the manual page. That's because a pipe is more or less a buffer between a producer and a consumer of data. The size of that buffer is system dependant.

On Linux, a pipe has a 64 kB buffer, so that is the most data you can have available.

Edit: If you can change the way your subprocess works, you might consider using a memory mapped file, or a nice big piece of shared memory.

Edit2: Using polling objects is probably faster than select.

like image 64
Roland Smith Avatar answered Sep 30 '22 17:09

Roland Smith