subprocess.Popen stdin read file

Tags:

subprocess

I'm trying to call a process on a file after part of it has been read. For example:

with open('in.txt', 'r') as a, open('out.txt', 'w') as b:
  header = a.readline()
  subprocess.call(['sort'], stdin=a, stdout=b)

This works fine if I don't read anything from a before doing the subprocess.call, but if I read anything from it, the subprocess doesn't see anything. This is using python 2.7.3. I can't find anything in the documentation that explains this behaviour, and a (very) brief glance at the subprocess source didn't reveal a cause.

852

asked Mar 14 '14 22:03

DRayX

1 Answers

If you open the file unbuffered then it works:

import subprocess

with open('in.txt', 'rb', 0) as a, open('out.txt', 'w') as b:
    header = a.readline()
    rc = subprocess.call(['sort'], stdin=a, stdout=b)

subprocess module works at a file descriptor level (low-level unbuffered I/O of the operating system). It may work with os.pipe(), socket.socket(), pty.openpty(), anything with a valid .fileno() method if OS supports it.

It is not recommended to mix the buffered and unbuffered I/O on the same file.

On Python 2, file.flush() causes the output to appear e.g.:

import subprocess
# 2nd
with open(__file__) as file:
    header = file.readline()
    file.seek(file.tell()) # synchronize (for io.open and Python 3)
    file.flush()           # synchronize (for C stdio-based file on Python 2)
    rc = subprocess.call(['cat'], stdin=file)

The issue can be reproduced without subprocess module with os.read():

#!/usr/bin/env python
# 2nd
import os

with open(__file__) as file: #XXX fully buffered text file EATS INPUT
    file.readline() # ignore header line
    os.write(1, os.read(file.fileno(), 1<<20))

If the buffer size is small then the rest of the file is printed:

#!/usr/bin/env python
# 2nd
import os

bufsize = 2 #XXX MAY EAT INPUT
with open(__file__, 'rb', bufsize) as file:
    file.readline() # ignore header line
    os.write(2, os.read(file.fileno(), 1<<20))

It eats more input if the first line size is not evenly divisible by bufsize.

The default bufsize and bufsize=1 (line-buffered) behave similar on my machine: the beginning of the file vanishes -- around 4KB.

file.tell() reports for all buffer sizes the position at the beginning of the 2nd line. Using next(file) instead of file.readline() leads to file.tell() around 5K on my machine on Python 2 due to the read-ahead buffer bug (io.open() gives the expected 2nd line position).

Trying file.seek(file.tell()) before the subprocess call doesn't help on Python 2 with default stdio-based file objects. It works with open() functions from io, _pyio modules on Python 2 and with the default open (also io-based) on Python 3.

Trying io, _pyio modules on Python 2 and Python 3 with and without file.flush() produces various results. It confirms that mixing buffered and unbuffered I/O on the same file descriptor is not a good idea.

answered Sep 23 '22 03:09

jfs

Related questions
                            
                                Strange `UnicodeEncodeError` using `os.path.exists`
                            
                                Getting a python traceback without an exception
                            
                                python, argparse: enable input parameter when another one has been specified
                            
                                Why doesn't Python call instance method __init__() on instance creation but calls class-provided __init__() instead?
                            
                                Gunicorn 'ImportError: No module named app.wsgiapp' on heroku
                            
                                Python dictionary comprehension very slow
                            
                                Python Error: name 'admin' is not defined
                            
                                Using default arguments before positional arguments
                            
                                DNS over proxy?
                            
                                How can I get the color of the last figure in matplotlib?
                            
                                Finding the indices of the top three values via argmin() or min() in python/numpy without mutation of list?
                            
                                Performing len on list of a zip object clears zip [duplicate]
                            
                                How to post data structure like json to flask?
                            
                                reverse() argument after ** must be a mapping
                            
                                One-sided Wilcoxon signed-rank test using scipy
                            
                                matplotlib: update position of patches (or: set_xy for circles)
                            
                                Cython, Python and KeyboardInterrupt ignored
                            
                                Is a specific timezone using DST right now?
                            
                                Correct way to check for empty or missing file in Python
                            
                                PyQt: Connecting a signal to a slot to start a background operation

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

subprocess.Popen stdin read file

Tags:

python

subprocess

DRayX

People also ask

1 Answers

jfs

Recent Activity

Donate For Us