Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is subprocess.run output different from shell output of same command?

I am using subprocess.run() for some automated testing. Mostly to automate doing:

dummy.exe < file.txt > foo.txt
diff file.txt foo.txt

If you execute the above redirection in a shell, the two files are always identical. But whenever file.txt is too long, the below Python code does not return the correct result.

This is the Python code:

import subprocess
import sys


def main(argv):

    exe_path = r'dummy.exe'
    file_path = r'file.txt'

    with open(file_path, 'r') as test_file:
        stdin = test_file.read().strip()
        p = subprocess.run([exe_path], input=stdin, stdout=subprocess.PIPE, universal_newlines=True)
        out = p.stdout.strip()
        err = p.stderr
        if stdin == out:
            print('OK')
        else:
            print('failed: ' + out)

if __name__ == "__main__":
    main(sys.argv[1:])

Here is the C++ code in dummy.cc:

#include <iostream>


int main()
{
    int size, count, a, b;
    std::cin >> size;
    std::cin >> count;

    std::cout << size << " " << count << std::endl;


    for (int i = 0; i < count; ++i)
    {
        std::cin >> a >> b;
        std::cout << a << " " << b << std::endl;
    }
}

file.txt can be anything like this:

1 100000
0 417
0 842
0 919
...

The second integer on the first line is the number of lines following, hence here file.txt will be 100,001 lines long.

Question: Am I misusing subprocess.run() ?

Edit

My exact Python code after comment (newlines,rb) is taken into account:

import subprocess
import sys
import os


def main(argv):

    base_dir = os.path.dirname(__file__)
    exe_path = os.path.join(base_dir, 'dummy.exe')
    file_path = os.path.join(base_dir, 'infile.txt')
    out_path = os.path.join(base_dir, 'outfile.txt')

    with open(file_path, 'rb') as test_file:
        stdin = test_file.read().strip()
        p = subprocess.run([exe_path], input=stdin, stdout=subprocess.PIPE)
        out = p.stdout.strip()
        if stdin == out:
            print('OK')
        else:
            with open(out_path, "wb") as text_file:
                text_file.write(out)

if __name__ == "__main__":
    main(sys.argv[1:])

Here is the first diff:

enter image description here

Here is the input file: https://drive.google.com/open?id=0B--mU_EsNUGTR3VKaktvQVNtLTQ

like image 441
user2346536 Avatar asked Jun 09 '16 19:06

user2346536


People also ask

Should I use shell true in subprocess?

We should avoid using 'shell=true' in subprocess call to avoid shell injection vulnerabilities. In this call you have to pass a string as a command to the shell. If call_method is user controlled then it can be used to execute any arbitrary command which can affect system.

Why are shells true in subprocess?

Setting the shell argument to a true value causes subprocess to spawn an intermediate shell process, and tell it to run the command. In other words, using an intermediate shell means that variables, glob patterns, and other special shell features in the command string are processed before the command is run.

How do you find the output of a subprocess run?

To capture the output of the subprocess. run method, use an additional argument named “capture_output=True”. You can individually access stdout and stderr values by using “output. stdout” and “output.

What is the difference between subprocess run and Popen?

The main difference is that subprocess. run() executes a command and waits for it to finish, while with subprocess. Popen you can continue doing your stuff while the process finishes and then just repeatedly call Popen. communicate() yourself to pass and receive data to your process.


1 Answers

To reproduce, the shell command:

subprocess.run("dummy.exe < file.txt > foo.txt", shell=True, check=True)

without the shell in Python:

with open('file.txt', 'rb', 0) as input_file, \
     open('foo.txt', 'wb', 0) as output_file:
    subprocess.run(["dummy.exe"], stdin=input_file, stdout=output_file, check=True)

It works with arbitrary large files.

You could use subprocess.check_call() in this case (available since Python 2), instead of subprocess.run() that is available only in Python 3.5+.

Works very well thanks. But then why was the original failing ? Pipe buffer size as in Kevin Answer ?

It has nothing to do with OS pipe buffers. The warning from the subprocess docs that @Kevin J. Chase cites is unrelated to subprocess.run(). You should care about OS pipe buffers only if you use process = Popen() and manually read()/write() via multiple pipe streams (process.stdin/.stdout/.stderr).

It turns out that the observed behavior is due to Windows bug in the Universal CRT. Here's the same issue that is reproduced without Python: Why would redirection work where piping fails?

As said in the bug description, to workaround it:

  • "use a binary pipe and do text mode CRLF => LF translation manually on the reader side" or use ReadFile() directly instead of std::cin
  • or wait for Windows 10 update this summer (where the bug should be fixed)
  • or use a different C++ compiler e.g., there is no issue if you use g++ on Windows

The bug affects only text pipes i.e., the code that uses <> should be fine (stdin=input_file, stdout=output_file should still work or it is some other bug).

like image 131
jfs Avatar answered Oct 20 '22 01:10

jfs