Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Safely running code in a process, redirect stdout in multithreading.Process

I'm working on a dataset from a MOOC. I have a lot of python3 code snippets that I need to run and get the results from. To do this I've written a python script that loops over each snippet. For each snippet I:

  1. Create new StringIO objects
  2. Set sys.stdout and sys.stderr to my stringIO buffers
  3. Execute the code snippet in a threading.thread object
  4. Join the thread
  5. Log the results in the stringIO buffers
  6. Restore stdout and stderr

This works fine for "correct" code, but this has issues in other cases:

  • When the code has an infinite loop, thread.join doesn't kill the thread. The thread is a daemon thread, so it runs quietly in the background until my loop finishes.
  • When the code has an infinite loop with a print(), the thread starts overwriting my actual stdout when I set it back to the default (away from the StringIO buffer). This pollutes my reporting.

Here is my current code:

def execCode(code, testScript=None):
    # create file-like string to capture output
    codeOut = io.StringIO()
    codeErr = io.StringIO()

    # capture output and errors
    sys.stdout = codeOut
    sys.stderr = codeErr

    def worker():
        exec(code, globals())

        if testScript:
            # flush stdout/stderror
            sys.stdout.truncate(0)
            sys.stdout.seek(0)
            # sys.stderr.truncate(0)
            # sys.stderr.seek(0)
            exec(testScript)

    thread = threading.Thread(target=worker, daemon=True)
    # thread = Process(target=worker) #, stdout=codeOut, stderr=codeErr)
    thread.start()
    thread.join(0.5)  # 500ms

    execError = codeErr.getvalue().strip()
    execOutput = codeOut.getvalue().strip()

    if thread.is_alive():
        thread.terminate()
        execError = "TimeError: run time exceeded"

    codeOut.close()
    codeErr.close()

    # restore stdout and stderr
    sys.stdout = sys.__stdout__
    sys.stderr = sys.__stderr__

    # restore any overridden functions
    restoreBuiltinFunctions()

    if execError:
        return False, stripOuterException(execError)
    else:
        return True, execOutput

To handle this case, I've been trying to use multithreading.Process and/or contextlib.redirect_stdout to run the code in a process (then I can call process.terminate()), but I'm not having any success capturing stdout/stderr.

So my question is: How can I redirect or capture stdout/stderr from a process? Alternatively, is there some other way I could go about trying to run and capture the output of arbitrary code?

(And yes, I know this is a bad idea in general; I'm running it in a virtual machine just in case there is malicious code in there somewhere)

Python version is 3.5.3


Update

It occurs to me that there is a little more flexibility in this situation. I have a function, preprocess(code) that accepts a the code submission as a string and alters it. Mostly I've been using it to swap out the value of some variables using regular expressions.

Here is an example implementation:

def preprocess(code):
    import re
    rx = re.compile('earlier_date\s*=\s*.+')
    code = re.sub(rx, "earlier_date = date(2016, 5, 3)", code)
    rx = re.compile('later_date\s*=\s*.+')
    code = re.sub(rx, "later_date = date(2016, 5, 24)", code)
    return code

I could use the preprocess function to help redirect STDOUT

like image 950
Zack Avatar asked Nov 13 '17 01:11

Zack


1 Answers

Communicating with running process is not straightforward in Python. For some reason you can only do it once in subprocess life cycle. From my experience, it is best to run a thread that starts a process and after timeout gets its output and terminates the subprocess.

Something like:

def subprocess_with_timeout(cmd, timeout_sec, stdin_data=None):
    """Execute `cmd` in a subprocess and enforce timeout `timeout_sec` seconds.

    Send `stdin_data` to the subprocess.

    Return subprocess exit code and outputs on natural completion of the subprocess.
    Raise an exception if timeout expires before subprocess completes."""
    proc = os.subprocess.Popen(cmd,
                        stdin=subprocess.PIPE,
                        stdout=subprocess.PIPE,
                        stderr=subprocess.PIPE)
    timer = threading.Timer(timeout_sec, proc.kill)
    # this will terminate subprocess after timeout
    timer.start()

    # you will be blocked here until process terminates (by itself or by timeout death switch)
    stdoutdata, stderrdata = proc.communicate(stdin_data) 

    if timer.is_alive():
        # Process completed naturally - cancel timer and return exit code
        timer.cancel()
        return proc.returncode, stdoutdata, stderrdata
    # Process killed by timer - raise exception
    raise TimeoutError('Process #%d killed after %f seconds' % (proc.pid, timeout_sec))

So, run a threaded executioner that calls for subprocess_with_timeout. It should handle the inputs and save the results.

Another idea is using a webserver to do the IPC. See this link

like image 57
igrinis Avatar answered Oct 16 '22 17:10

igrinis