I have a folder with 500 input files (total size of all files is ~ 500[MB]).
I'd like to write a python script that does the following:
(1) load all of the input files to memory
(2) initializes an empty python list that will later be used ... see bullet (4)
(3) start 15 different (independent) processes: each of these uses the same input data [from (1)] -- yet uses a different algorithms to processes it, thus generating different results
(4) I'd like all the independent processes [from step (3)] to store their output in the same python list [same list that was initialized in step (2)]
Once all 15 processes have completed their run, I will have one python list that includes the results of all the 15 independent processes.
My question is, is it possible to do the above efficiently in python? if so, can you provide a scheme / sample code that illustrates how to do so?
Note #1: I will be running this on a strong, multi-core server; so the goal here is to use all the processing power while sharing some memory {input data, output list} among all the independent processes.
Note #2: I am working in a Linux environment
ok I just whipped this up using zeromq to demonstrate a single subscriber to multiple publishers. You could probably do the same with queues but you would need to manage them a bit more. zeromq sockets just work which makes it nice for things like this IMO.
"""
demo of multiple processes doing processing and publishing the results
to a common subscriber
"""
from multiprocessing import Process
class Worker(Process):
def __init__(self, filename, bind):
self._filename = filename
self._bind = bind
super(Worker, self).__init__()
def run(self):
import zmq
import time
ctx = zmq.Context()
result_publisher = ctx.socket(zmq.PUB)
result_publisher.bind(self._bind)
time.sleep(1)
with open(self._filename) as my_input:
for l in my_input.readlines():
result_publisher.send(l)
if __name__ == '__main__':
import sys
import os
import zmq
#assume every argument but the first is a file to be processed
files = sys.argv[1:]
# create a worker for each file to be processed if it exists pass
# in a bind argument instructing the socket to communicate via ipc
workers = [Worker(f, "ipc://%s_%s" % (f, i)) for i, f \
in enumerate((x for x in files if os.path.exists(x)))]
# create subscriber socket
ctx = zmq.Context()
result_subscriber = ctx.socket(zmq.SUB)
result_subscriber.setsockopt(zmq.SUBSCRIBE, "")
# wire up subscriber to whatever the worker is bound to
for w in workers:
print w._bind
result_subscriber.connect(w._bind)
# start workers
for w in workers:
print "starting workers..."
w.start()
result = []
# read from the subscriber and add it to the result list as long
# as at least one worker is alive
while [w for w in workers if w.is_alive()]:
result.append(result_subscriber.recv())
else:
# output the result
print result
oh and to get zmq just
$ pip install pyzmq-static
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With