How to make python scripts pipe-able both in bash and within python

Tags:

Summary: I'd like to write python scripts that act like bash scripts on the command line, but then I'd also like to pipe them together easily in python. Where I'm having trouble is the glue to make the latter happen.

So imagine I wrote two scripts, script1.py and script2.py and I can pipe them together like so:

echo input_string | ./script1.py -a -b | ./script2.py -c -d

How do I get this behavior from within another python file? Here's the way I know, but I don't like:

arg_string_1 = convert_to_args(param_1, param_2)
arg_string_2 = convert_to_args(param_3, param_4)
output_string = subprocess.check_output("echo " + input_string + " | ./script1.py " + arg_string_1 + " | ./script2.py " + arg_string_2)

If I didn't want to take advantage of multithreading, I could do something like this (?):

input1  = StringIO(input_string)
output1 = StringIO()
script1.main(param_1, param_2, input1, output1)
input2  = StringIO(output1.get_value())
output2 = StringIO()
script2.main(param_3, param_4, input2, output2)

Here's the approach I was trying, but I got stuck at writing the glue. I'd appreciate either learning how to finish my approach below, or suggestions for a better design/approach!

My approach: I wrote script1.py and script2.py to look like:

#!/usr/bin/python3

... # import sys and define "parse_args"

def main(param_1, param_2, input, output):
   for line in input:
     ...
     print(stuff, file=output)

if __name__ == "__main__":
  parameter_1, parameter_2 = parse_args(sys.argv)
  main(parameter_1, parameter_2, sys.stdin, sys.stdout)

Then I wanted to write something like this, but don't know how to finish:

pipe_out, pipe_in = ????
output = StringIO()
thread_1 = Thread(target=script1.main, args=(param_1, param_2, StreamIO(input_string), pipe_out))
thread_2 = Thread(target=script2.main, args=(param_3, param_4, pipe_in, output)
thread_1.start()
thread_2.start()
thread_1.join()
thread_2.join()
output_str = output.get_value()

482

asked Dec 25 '15 01:12

usul

2 Answers

For the "pipe in", uses sys.stdin with the readlines() method. (Using method read() would read one character at a time.)

For passing information from one thread to another, you can use Queue. You must define one way to signal the end of data. In my example, since all data passed between threads are str, I simply use a None object to signal the end of data (since it cannot appear in the transmitted data).

One could also use more threads, or use different functions in threads.

I did not include the sys.argvin my example to keep it simple. Modifying it to get parameters (parameter1, ...) should be easy.

import sys
from threading import Thread
from Queue import Queue
import fileinput

def stdin_to_queue( output_queue ):
  for inp_line in sys.stdin.readlines():     # input one line at at time                                                
    output_queue.put( inp_line, True, None )  # blocking, no timeout
  output_queue.put( None, True, None )    # signal the end of data                                                  


def main1(input_queue, output_queue, arg1, arg2):
  do_loop = True
  while do_loop:
    inp_data = input_queue.get(True)
    if inp_data is None:
      do_loop = False
      output_queue.put( None, True, None )  # signal end of data                                                    
    else:
      out_data = arg1 + inp_data.strip('\r\n').upper() + arg2 #  or whatever transformation...                                    
      output_queue.put( out_data, True, None )

def queue_to_stdout(input_queue):
  do_loop = True
  while do_loop:
    inp_data = input_queue.get(True)
    if inp_data is None:
      do_loop = False
    else:
      sys.stdout.write( inp_data )


def main():
  q12 = Queue()
  q23 = Queue()
  q34 = Queue()
  t1 = Thread(target=stdin_to_queue, args=(q12,) )
  t2 = Thread(target=main1, args=(q12,q23,'(',')') )
  t3 = Thread(target=main1, args=(q23,q34,'[',']') )
  t4 = Thread(target=queue_to_stdout, args=(q34,))
  t1.start()
  t2.start()
  t3.start()
  t4.start()


main()

Finally, I tested this program (python2) with a text file.

head sometextfile.txt | python script.py

174

answered Oct 18 '22 03:10

Sci Prog

Redirect the return value to stdout depending on whether the script is being run from the command line:

#!/usr/bin/python3
import sys

# Example function
def main(input):
    # Do something with input producing stuff
    ...
    return multipipe(stuff)

if __name__ == '__main__':
    def multipipe(data):
        print(data)

    input = parse_args(sys.argv)
    main(input)
else:
    def multipipe(data):
        return data

Each other script will have the same two definitions of multipipe. Now, use multipipe for output.

If you call all the scripts together from the command line $ ./scrip1.py | ./scrip2.py, each will have __name__ == '__main__' and so multipipe will print it all to stdout to be read as an argument by the next script (and return None, so each function returns None, but you're not looking at the return values anyway in this case).

If you call them within some other python script, each function will return whatever you passed to multipipe.

Effectively, you can use your existing functions, just replace print(stuff, file=output) with return multipipe(stuff). Nice and simple.

To use it with multithreading or multiprocessing, set the functions up so that each function returns a single thing, and plug them into a simple function that adds data to a multithreading queue. For an example of such a queueing system, see the sample at the bottom of the Queue docs. With that example, just make sure that each step in the pipeline puts None (or other sentinel value of your choice - I like ... for that since it's extremely rare that you'd pass the Ellipsis object for any reason other than as a marker for its singleton-ness) in the queue to the next one to signify done-ness.

answered Oct 18 '22 05:10

Vivian

Related questions
                            
                                functools.wraps equivalent for class decorator
                            
                                Using python's Multiprocessing makes response hang on gunicorn
                            
                                How to tell if boto is using SSLv3 or TLS?
                            
                                the filter of sniff function in scapy does not work properly
                            
                                Sublime Text 3 REPL - Open program in same REPL window
                            
                                Django: How to access test database?
                            
                                Django: Use LayerMapping to update an existing model?
                            
                                Extract emoticons from a text
                            
                                Gtk3 TextBuffer.serialize() returns text with format tags, even when there is visually none
                            
                                Closing the window doesn't kill all processes
                            
                                Angular route not working when used with Google App Engine and Flask
                            
                                self referential many to many flask-sqlalchemy
                            
                                Python Multiprocessing RuntimeError on Windows
                            
                                Python Regex slower than expected
                            
                                Using freezegun, why do pytz.utc and utcnow() output different datetimes?
                            
                                Why won't Python Multiprocessing Workers die?
                            
                                Sierpinski's Triangle Pygame Recursive
                            
                                pandas plots on Seaborn FacetGrid
                            
                                Streaming data for pandas df
                            
                                Pass FILE * into function from Python / ctypes

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to make python scripts pipe-able both in bash and within python

Tags:

python

python-multithreading

pipeline

usul

People also ask

2 Answers

Sci Prog

Vivian

Recent Activity

Donate For Us