I am totaly new in multiprocessing. I am trying to change my code in order to run part of it simultaneously.
I have a huge list where I have to call an API for each node. Since, the APIs are independence, I don't need the result of the first one in order to proceed to the second one. So, I have this code:
def xmlpart1(id):
..call the api..
..retrieve the xml..
..find the part of xml I want..
return xml_part1
def xmlpart2(id):
..call the api..
..retrieve the xml..
..find the part of xml I want..
return xml_part2
def main(index):
mylist = [[..,..],[..,..],[..,..],[..,...]] # A huge list of lists with ids I need for calling the APIs
myL= mylist[index] c
mydic = {}
for i in myL:
flag1 = xmlpart1(i)
flag2 = xmlpart2(i)
mydic[flag1] = flag2
root = "myfilename %s.json" %(str(index))
with open(root, "wb") as f:
json.dump(mydic,f)
from multiprocessing import Pool
if __name__=='__main__':
Pool().map(main, [0,1,2,3])
After a few suggestions from here and from the chat, I end up with this code. The problem is still there. I run the script at 9:50. At 10:25 the first file "myfilename 0.json" appeared in my folder. Now it is 11:25 and neither of the other files have been appeared. The sublists have equal length and they do the same thing, so they need approximately the same time.
This is something more suited to the multiprocessing.Pool() class.
Here's a simple example:
from multiprocessing import Pool
def job(args):
"""Your job function"""
Pool().map(job, inputs)
Where:
inputs is your list of inputs. Each input gets passed to job and processed in a separate process.You get the results back as a list when all jobs have completed.
multiprocessing.Pool().map is just like the Python builtin map() but sets up a process pool of workers for you and passes each input to the given function.
See the docs for more details: http://docs.python.org/2/library/multiprocessing.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With