Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make python scripts pipe-able both in bash and within python

Summary: I'd like to write python scripts that act like bash scripts on the command line, but then I'd also like to pipe them together easily in python. Where I'm having trouble is the glue to make the latter happen.

So imagine I wrote two scripts, script1.py and script2.py and I can pipe them together like so:

echo input_string | ./script1.py -a -b | ./script2.py -c -d

How do I get this behavior from within another python file? Here's the way I know, but I don't like:

arg_string_1 = convert_to_args(param_1, param_2)
arg_string_2 = convert_to_args(param_3, param_4)
output_string = subprocess.check_output("echo " + input_string + " | ./script1.py " + arg_string_1 + " | ./script2.py " + arg_string_2)

If I didn't want to take advantage of multithreading, I could do something like this (?):

input1  = StringIO(input_string)
output1 = StringIO()
script1.main(param_1, param_2, input1, output1)
input2  = StringIO(output1.get_value())
output2 = StringIO()
script2.main(param_3, param_4, input2, output2)

Here's the approach I was trying, but I got stuck at writing the glue. I'd appreciate either learning how to finish my approach below, or suggestions for a better design/approach!

My approach: I wrote script1.py and script2.py to look like:

#!/usr/bin/python3

... # import sys and define "parse_args"

def main(param_1, param_2, input, output):
   for line in input:
     ...
     print(stuff, file=output)

if __name__ == "__main__":
  parameter_1, parameter_2 = parse_args(sys.argv)
  main(parameter_1, parameter_2, sys.stdin, sys.stdout)

Then I wanted to write something like this, but don't know how to finish:

pipe_out, pipe_in = ????
output = StringIO()
thread_1 = Thread(target=script1.main, args=(param_1, param_2, StreamIO(input_string), pipe_out))
thread_2 = Thread(target=script2.main, args=(param_3, param_4, pipe_in, output)
thread_1.start()
thread_2.start()
thread_1.join()
thread_2.join()
output_str = output.get_value()
like image 482
usul Avatar asked Dec 25 '15 01:12

usul


People also ask

Can you run a Python script within a Python script?

The first line of 'import python_2' in the python_1 script, would call the second python_2 script. Whenever you want to run one Python script from another, you'll need to import the exact name of the Python script that you'd like to call.

Can you run a Python script in bash?

How do I run a Python script from bash terminal? Running a Script Open the terminal by searching for it in the dashboard or pressing Ctrl + Alt + T . Navigate the terminal to the directory where the script is located using the cd command. Type python SCRIPTNAME.py in the terminal to execute the script.


2 Answers

For the "pipe in", uses sys.stdin with the readlines() method. (Using method read() would read one character at a time.)

For passing information from one thread to another, you can use Queue. You must define one way to signal the end of data. In my example, since all data passed between threads are str, I simply use a None object to signal the end of data (since it cannot appear in the transmitted data).

One could also use more threads, or use different functions in threads.

I did not include the sys.argvin my example to keep it simple. Modifying it to get parameters (parameter1, ...) should be easy.

import sys
from threading import Thread
from Queue import Queue
import fileinput

def stdin_to_queue( output_queue ):
  for inp_line in sys.stdin.readlines():     # input one line at at time                                                
    output_queue.put( inp_line, True, None )  # blocking, no timeout
  output_queue.put( None, True, None )    # signal the end of data                                                  


def main1(input_queue, output_queue, arg1, arg2):
  do_loop = True
  while do_loop:
    inp_data = input_queue.get(True)
    if inp_data is None:
      do_loop = False
      output_queue.put( None, True, None )  # signal end of data                                                    
    else:
      out_data = arg1 + inp_data.strip('\r\n').upper() + arg2 #  or whatever transformation...                                    
      output_queue.put( out_data, True, None )

def queue_to_stdout(input_queue):
  do_loop = True
  while do_loop:
    inp_data = input_queue.get(True)
    if inp_data is None:
      do_loop = False
    else:
      sys.stdout.write( inp_data )


def main():
  q12 = Queue()
  q23 = Queue()
  q34 = Queue()
  t1 = Thread(target=stdin_to_queue, args=(q12,) )
  t2 = Thread(target=main1, args=(q12,q23,'(',')') )
  t3 = Thread(target=main1, args=(q23,q34,'[',']') )
  t4 = Thread(target=queue_to_stdout, args=(q34,))
  t1.start()
  t2.start()
  t3.start()
  t4.start()


main()

Finally, I tested this program (python2) with a text file.

head sometextfile.txt | python script.py 
like image 174
Sci Prog Avatar answered Oct 18 '22 03:10

Sci Prog


Redirect the return value to stdout depending on whether the script is being run from the command line:

#!/usr/bin/python3
import sys

# Example function
def main(input):
    # Do something with input producing stuff
    ...
    return multipipe(stuff)

if __name__ == '__main__':
    def multipipe(data):
        print(data)

    input = parse_args(sys.argv)
    main(input)
else:
    def multipipe(data):
        return data

Each other script will have the same two definitions of multipipe. Now, use multipipe for output.

If you call all the scripts together from the command line $ ./scrip1.py | ./scrip2.py, each will have __name__ == '__main__' and so multipipe will print it all to stdout to be read as an argument by the next script (and return None, so each function returns None, but you're not looking at the return values anyway in this case).

If you call them within some other python script, each function will return whatever you passed to multipipe.

Effectively, you can use your existing functions, just replace print(stuff, file=output) with return multipipe(stuff). Nice and simple.

To use it with multithreading or multiprocessing, set the functions up so that each function returns a single thing, and plug them into a simple function that adds data to a multithreading queue. For an example of such a queueing system, see the sample at the bottom of the Queue docs. With that example, just make sure that each step in the pipeline puts None (or other sentinel value of your choice - I like ... for that since it's extremely rare that you'd pass the Ellipsis object for any reason other than as a marker for its singleton-ness) in the queue to the next one to signify done-ness.

like image 28
Vivian Avatar answered Oct 18 '22 05:10

Vivian