python multiprocessing apply_async only uses one process

Tags:

I have a script that includes opening a file from a list and then doing something to the text within that file. I'm using python multiprocessing and Pool to try to parallelize this operation. A abstraction of the script is below:

import os
from multiprocessing import Pool

results = []
def testFunc(files):
    for file in files:
        print "Working in Process #%d" % (os.getpid())
        #This is just an illustration of some logic. This is not what I'm actually doing.
        for line in file:
            if 'dog' in line:
                results.append(line)

if __name__=="__main__":
    p = Pool(processes=2)
    files = ['/path/to/file1.txt', '/path/to/file2.txt']
    results = p.apply_async(testFunc, args = (files,))
    results2 = results.get()

When I run this the print out of the process id is the same for each iteration. Basically what I'm trying to do is take each element of the input list and fork it out to a separate process, but it seems like one process is doing all of the work.

321

asked Sep 18 '12 19:09

user1074057

2 Answers

apply_async farms out one task to the pool. You would need to call apply_async many times to exercise more processors.
Don't allow both processes to try to write to the same list, results. Since the pool workers are separate processes, the two won't be writing to the same list. One way to work around this is to use an ouput Queue. You could set it up yourself, or use apply_async's callback to setup the Queue for you. apply_async will call the callback once the function completes.
You could use map_async instead of apply_async, but then you'd get a list of lists, which you'd then have to flatten.

So, perhaps try instead something like:

import os
import multiprocessing as mp

results = []   

def testFunc(file):
    result = []
    print "Working in Process #%d" % (os.getpid())
    # This is just an illustration of some logic. This is not what I'm
    # actually doing.
    with open(file, 'r') as f:
        for line in f:
            if 'dog' in line:
                result.append(line)
    return result


def collect_results(result):
    results.extend(result)

if __name__ == "__main__":
    p = mp.Pool(processes=2)
    files = ['/path/to/file1.txt', '/path/to/file2.txt']
    for f in files:
        p.apply_async(testFunc, args=(f, ), callback=collect_results)
    p.close()
    p.join()
    print(results)

193

answered Sep 19 '22 23:09

unutbu

Maybe in this case you should use map_async:

import os
from multiprocessing import Pool

results = []
def testFunc(file):
    message =  ("Working in Process #%d" % (os.getpid()))
    #This is just an illustration of some logic. This is not what I'm actually doing.
    for line in file:
        if 'dog' in line:
            results.append(line)
    return message

if __name__=="__main__":
    print("saddsf")
    p = Pool(processes=2)
    files = ['/path/to/file1.txt', '/path/to/file2.txt']
    results = p.map_async(testFunc, files)
    print(results.get())

answered Sep 18 '22 23:09

Odomontois

Related questions
                            
                                Pip install pygraphviz fails: Failed building wheel for pygraphviz
                            
                                How to calculate mean color of image in numpy array?
                            
                                Error "TypeError: type numpy.ndarray doesn't define __round__ method"
                            
                                Flask view raises TypeError got unexpected keyword argument
                            
                                Pylint giving me "Final new line missing"
                            
                                How to generate list of unique random floats in Python
                            
                                Does python pip have the equivalent of node's package.json?
                            
                                confusion matrix error "Classification metrics can't handle a mix of multilabel-indicator and multiclass targets"
                            
                                Keras Sequential model with multiple inputs
                            
                                Tensorflow CUDA - CUPTI error: CUPTI could not be loaded or symbol could not be found
                            
                                How to give a Pydantic list field a default value?
                            
                                How can I fix a JupyterLab "Code Editor out of Sync" error message?
                            
                                Pylint-django raising error about Django not being configured when that's not the case (VSCode)
                            
                                AttributeError: module 'torchtext.data' has no attribute 'Field'
                            
                                Choosing between different switch-case replacements in Python - dictionary or if-elif-else?
                            
                                How do I convert RFC822 to a python datetime object?
                            
                                How does garbage collection and scoping work in C#? [duplicate]
                            
                                Adding words to nltk stoplist
                            
                                Separating file extensions using python os.path module
                            
                                How to use os.umask() in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

python multiprocessing apply_async only uses one process

Tags:

python

multiprocessing

user1074057

People also ask

2 Answers

unutbu

Odomontois

Recent Activity

Donate For Us