Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Alternatives to Python Popen.communicate() memory limitations?

Tags:

I have the following chunk of Python code (running v2.7) that results in MemoryError exceptions being thrown when I work with large (several GB) files:

myProcess = Popen(myCmd, shell=True, stdout=PIPE, stderr=PIPE)
myStdout, myStderr = myProcess.communicate()
sys.stdout.write(myStdout)
if myStderr:
    sys.stderr.write(myStderr)

In reading the documentation to Popen.communicate(), there appears to be some buffering going on:

Note The data read is buffered in memory, so do not use this method if the data size is large or unlimited.

Is there a way to disable this buffering, or force the cache to be cleared periodically while the process runs?

What alternative approach should I use in Python for running a command that streams gigabytes of data to stdout?

I should note that I need to handle output and error streams.

like image 737
Alex Reynolds Avatar asked Jul 29 '11 23:07

Alex Reynolds


2 Answers

I think I found a solution:

myProcess = Popen(myCmd, shell=True, stdout=PIPE, stderr=PIPE)
for ln in myProcess.stdout:
    sys.stdout.write(ln)
for ln in myProcess.stderr:
    sys.stderr.write(ln)

This seems to get my memory usage down enough to get through the task.

Update

I have recently found a more flexible way of handing data streams in Python, using threads. It's interesting that Python is so poor at something that shell scripts can do so easily!

like image 114
Alex Reynolds Avatar answered Sep 27 '22 17:09

Alex Reynolds


What I would probably do instead, if I needed to read the stdout for something that large, is send it to a file on creation of the process.

with open(my_large_output_path, 'w') as fo:
    with open(my_large_error_path, 'w') as fe:
        myProcess = Popen(myCmd, shell=True, stdout=fo, stderr=fe)

Edit: If you need to stream, you could try making a file-like object and passing it to stdout and stderr. (I haven't tried this, though.) You could then read (query) from the object as it's being written.

like image 38
TorelTwiddler Avatar answered Sep 27 '22 19:09

TorelTwiddler