Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How is Python's map_async keeping results in order?

I'm trying to explore Python's multiprocessing library for py3.3 and I noticed an odd result in the map_async function that I've been unable to explain. I've been expecting the results stored from the callback to be "out of order". That is, if I feed a number of tasks to the worker processes, some should complete before others, not necessarily in the same order they're fed in or exist in the input list. However, I'm getting an ordered set of results that corresponds perfectly with the inputted tasks. This is the case even after purposely trying to "sabotage" some processes by slowing them down (which, presumably would allow others to complete before it).

I have a print statement in the calculate function that shows they're being calculated out of order, yet results are still in order. Though I'm not sure I can trust a print statement as a great indicator that things are actually calculating out of order.

The test process (a general example): Build a list of objects, each of which holds an integer. Submit that list of objects to map_async as arguments, along with the function "calculate" that update's the object's numValue attribute with a squared value. Then the "calculate" function returns the object with its updated value.

Some code:

import time
import multiprocessing
import random

class NumberHolder():
    def __init__(self,numValue):
        self.numValue = numValue    #Only one attribute

def calculate(obj):
    if random.random() >= 0.5:
        startTime = time.time()
        timeWaster = [random.random() for x in range(5000000)] #Waste time.
        endTime = time.time()           #Establish end time
        print("%d object got stuck in here for %f seconds"%(obj.numValue,endTime-startTime))

#Main Process
if __name__ == '__main__':
    numbersToSquare = [x for x in range(0,100)]     #I'm 
    taskList = []

    for eachNumber in numbersToSquare:
        taskList.append(NumberHolder(eachNumber))   #Create a list of objects whose numValue is equal to the numbers we want to square

    results = [] #Where the results will be stored
    pool = multiprocessing.Pool(processes=(multiprocessing.cpu_count() - 1)) #Don't use all my processing power.
    r = pool.map_async(calculate, taskList, callback=results.append)  #Using fxn "calculate", feed taskList, and values stored in "results" list
    r.wait()                # Wait on the results from the map_async

results = results[0]    #All of the entries only exist in the first offset
for eachObject in results:      #Loop through them and show them
    print(eachObject.numValue)          #If they calc'd "out of order", I'd expect append out of order

I found this well written response, which seems to support the idea that map_async can have results that are "out of order": multiprocessing.Pool: When to use apply, apply_async or map? . I also looked up the documentation here ( http://docs.python.org/3.3/library/multiprocessing.html ). For map_async it says for this method "...If callback is specified then it should be a callable which accepts a single argument. When the result becomes ready callback is applied to it (unless the call failed). callback should complete immediately since otherwise the thread which handles the results will get blocked"

Am I misunderstanding how this is supposed to work? Any help is much appreciated.

like image 866
Thomas Avatar asked Oct 30 '13 18:10

Thomas


1 Answers

That's the expected behavior. The docs say:

A variant of the map() method which returns a result object.

The "result object" is just a container class that holds the calculated results. When you call r.wait(), you wait until all of the results are aggregated and put in order. Even though it processes tasks out of order, the results will still be in the original order.

If you want the results to be yielded as they are calculated, use imap_unordered.

like image 97
Blender Avatar answered Sep 20 '22 07:09

Blender