Writing to a file with multiprocessing

Tags:

I'm having the following problem in python.

I need to do some calculations in parallel whose results I need to be written sequentially in a file. So I created a function that receives a multiprocessing.Queue and a file handle, do the calculation and print the result in the file:

import multiprocessing
from multiprocessing import Process, Queue
from mySimulation import doCalculation   

# doCalculation(pars) is a function I must run for many different sets of parameters and collect the results in a file

def work(queue, fh):
while True:
    try:
        parameter = queue.get(block = False)
        result = doCalculation(parameter) 
        print >>fh, string
    except:
        break


if __name__ == "__main__":
    nthreads = multiprocessing.cpu_count()
    fh = open("foo", "w")
    workQueue = Queue()
    parList = # list of conditions for which I want to run doCalculation()
    for x in parList:
        workQueue.put(x)
    processes = [Process(target = writefh, args = (workQueue, fh)) for i in range(nthreads)]
    for p in processes:
       p.start()
    for p in processes:
       p.join()
    fh.close()

But the file ends up empty after the script runs. I tried to change the worker() function to:

def work(queue, filename):
while True:
    try:
        fh = open(filename, "a")
        parameter = queue.get(block = False)
        result = doCalculation(parameter) 
        print >>fh, string
        fh.close()
    except:
        break

and pass the filename as parameter. Then it works as I intended. When I try to do the same thing sequentially, without multiprocessing, it also works normally.

Why it didn't worked in the first version? I can't see the problem.

Also: can I guarantee that two processes won't try to write the file simultaneously?

EDIT:

Thanks. I got it now. This is the working version:

import multiprocessing
from multiprocessing import Process, Queue
from time import sleep
from random import uniform

def doCalculation(par):
    t = uniform(0,2)
    sleep(t)
    return par * par  # just to simulate some calculation

def feed(queue, parlist):
    for par in parlist:
            queue.put(par)

def calc(queueIn, queueOut):
    while True:
        try:
            par = queueIn.get(block = False)
            print "dealing with ", par, "" 
            res = doCalculation(par)
            queueOut.put((par,res))
        except:
            break

def write(queue, fname):
    fhandle = open(fname, "w")
    while True:
        try:
            par, res = queue.get(block = False)
            print >>fhandle, par, res
        except:
            break
    fhandle.close()

if __name__ == "__main__":
    nthreads = multiprocessing.cpu_count()
    fname = "foo"
    workerQueue = Queue()
    writerQueue = Queue()
    parlist = [1,2,3,4,5,6,7,8,9,10]
    feedProc = Process(target = feed , args = (workerQueue, parlist))
    calcProc = [Process(target = calc , args = (workerQueue, writerQueue)) for i in range(nthreads)]
    writProc = Process(target = write, args = (writerQueue, fname))


    feedProc.start()
    for p in calcProc:
        p.start()
    writProc.start()

    feedProc.join ()
    for p in calcProc:
        p.join()
    writProc.join ()

243

asked Jun 29 '11 17:06

Rafael S. Calsaverini

2 Answers

You really should use two queues and three separate kinds of processing.

Put stuff into Queue #1.
Get stuff out of Queue #1 and do calculations, putting stuff in Queue #2. You can have many of these, since they get from one queue and put into another queue safely.
Get stuff out of Queue #2 and write it to a file. You must have exactly 1 of these and no more. It "owns" the file, guarantees atomic access, and absolutely assures that the file is written cleanly and consistently.

141

answered Nov 02 '22 22:11

S.Lott

If anyone is looking for a simple way to do the same, this can help you. I don't think there are any disadvantages to doing it in this way. If there are, please let me know.

import multiprocessing 
import re

def mp_worker(item):
    # Do something
    return item, count

def mp_handler():
    cpus = multiprocessing.cpu_count()
    p = multiprocessing.Pool(cpus)
    # The below 2 lines populate the list. This listX will later be accessed parallely. This can be replaced as long as listX is passed on to the next step.
    with open('ExampleFile.txt') as f:
        listX = [line for line in (l.strip() for l in f) if line]
    with open('results.txt', 'w') as f:
        for result in p.imap(mp_worker, listX):
            # (item, count) tuples from worker
            f.write('%s: %d\n' % result)

if __name__=='__main__':
    mp_handler()

Source: Python: Writing to a single file with queue while using multiprocessing Pool

answered Nov 02 '22 23:11

Menezes Sousa

Related questions
                            
                                Is there a built-in KL divergence loss function in TensorFlow?
                            
                                More idiomatic way to display images in a grid with numpy
                            
                                How to make pytest wait for (manual) user action?
                            
                                How to verify JWT id_token produced by MS Azure AD?
                            
                                How much memory will a list with one million elements take up in Python?
                            
                                How tf.transpose works in tensorflow?
                            
                                What is the role of npartitions in a Dask dataframe?
                            
                                PANDAS split dataframe to multiple by unique values rows
                            
                                Python Falcon - get POST data
                            
                                How to get the path of a function in Python?
                            
                                How to update weights manually with Keras
                            
                                Python : How to set logging level for all loggers to INFO
                            
                                Proper way to create class variable in Data Class
                            
                                AWS Elastic Beanstalk Container Commands Failing
                            
                                Architecture Not Supported Error when installing nltk with pip on Mac
                            
                                Getting error with deployed Serverless Flask app on AWS, No module named 'werkzeug._compat'
                            
                                How can I check the syntax of Python code in Emacs without actually executing it?
                            
                                How much input validation should I be doing on my python functions/methods?
                            
                                Auto-load a module on python startup
                            
                                How do I download a website using python 3?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Writing to a file with multiprocessing

Tags:

python

file-io

multiprocessing

queue

Rafael S. Calsaverini

People also ask

2 Answers

S.Lott

Menezes Sousa

Recent Activity

Donate For Us