Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - Limit amount of data subprocess.Popen can produce

I found lots of similar questions asking size of an object at run time in python. Some of the answers suggests to set a limit on amount of memory of sub-process. I do not want to set a limit on memory of sub-process. Here is what I want --

I'm using subprocess.Popen() to execute an external program. I can, very well, get standard output and error with process.stdout.readlines() and process.stderr.readlines() after the process is complete.

I have a problem when an erroneous program gets into an infinite loop and keeps producing output. Since subprocess.Popen() stores output data in memory this infinite loop quickly eats up entire memory and program slows down.

One solution is that I can run the command with timeout. But programs take variable time to complete. Large timeout, for a program taking small time and having an infinite loop, defeats the purpose of having it.

Is there any simple way where I can put an upper limit say 200MB on amount of data the command can produce? If it exceeds the limit command should get killed.

like image 847
Aryaveer Avatar asked May 02 '13 07:05

Aryaveer


People also ask

What is the difference between subprocess run and Popen?

The main difference is that subprocess. run() executes a command and waits for it to finish, while with subprocess. Popen you can continue doing your stuff while the process finishes and then just repeatedly call Popen. communicate() yourself to pass and receive data to your process.

What does subprocess Popen return?

Popen Function The function should return a pointer to a stream that may be used to read from or write to the pipe while also creating a pipe between the calling application and the executed command. Immediately after starting, the Popen function returns data, and it does not wait for the subprocess to finish.

Do you need to close Popen?

Popen do we need to close the connection or subprocess automatically closes the connection? Usually, the examples in the official documentation are complete. There the connection is not closed. So you do not need to close most probably.

Does subprocess Popen block?

Popen is nonblocking. call and check_call are blocking. You can make the Popen instance block by calling its wait or communicate method.


2 Answers

First: It is not subprocess.Popen() storing the data, but it is the pipe between "us" and "our" subprocess.

You shouldn't use readlines() in this case as this will indefinitely buffer the data and only at the end return them as a list (in this case, it is indeed this function which stores the data).

If you do something like

bytes = lines = 0
for line in process.stdout:
    bytes += len(line)
    lines += 1
    if bytes > 200000000 or lines > 10000:
        # handle the described situation
        break

you can act as wanted in your question. But you shouldn't forget to kill the subprocess afterwards in order to stop it producing further data.

But if you want to take care of stderr as well, you'd have to try to replicate process.communicate()'s behaviour with select() etc., and act appropriately.

like image 90
glglgl Avatar answered Oct 13 '22 00:10

glglgl


There doesn't seem to be an easy answer to what you want

http://linux.about.com/library/cmd/blcmdl2_setrlimit.htm

rlimit has a flag to limit memory, CPU or number of open files, but apparently nothing to limit the amount of I/O.

You should handle the case manually as already described.

like image 41
LtWorf Avatar answered Oct 13 '22 01:10

LtWorf