Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Communicating with a unix filter process in Python

I am writing a Python program that needs to clean many small strings using an external unix program which works as a filter. Currently, I create a new subprocess for each string I want to clean:

import subprocess
def cleanstring(s):
    proc = subprocess.Popen(['/bin/filter','-n'],
        stdin=subprocess.PIPE, stdout=subprocess.PIPE,
        stderr=subprocess.PIPE
    )
    out, err = proc.communicate(s)
    assert not err
    return out

Obviously, this approach is grossly inefficient. What would be an efficient way to start the filter subprocess and communicate with it via stdin/stdout for as long as needed?

I've been looking into using Python Queues to implement this, but they may be an overkill for this. The code will be called from a Django view on a non-threaded web server, so it will only be a single thread calling it multiple times.

Thanks!

like image 471
m000 Avatar asked Mar 08 '26 00:03

m000


1 Answers

If you haven't measured it, then it's not a performance problem, much less "grossly inefficient".

That said, you can communicate with a subprocess like this:

import subprocess
import sys

p = subprocess.Popen('bc', shell=False, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)

 for i in range(10):
     p.stdin.write('%s*2\n' % (i,))
     res = p.stdout.readline()
     if res:
         print "vtrip says %s*2 is %s" % (i, res.strip())
         p.stdin.flush()

This prints the doubles of 0-9, as returned by the same bc process. Should be easy to adapt to detex (the main thing would be to handle flush correctly so one end doesn't stuck waiting for the other).

That's the communicating part. As for the "long running inside Django" might not be a good idea. Queues might indeed be too much.

And task queues like Celery et al are for tasks to be handled independently, not for the same long running service handling each one.

Maybe run some small python daemon on the side, keeping the filter process open and handling requests from Django for it? Are we talking about heavy load, or something internal, for, say, 100 users per day? You might not need much synchronisation besides some crude locking at all.

like image 115
Hejazzman Avatar answered Mar 09 '26 15:03

Hejazzman



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!