Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python multiprocessing: print() inside apply_async()

print() inside the function that is passed to multiprocessing's apply_async() does not print out anything.

I want to eventually use apply_async to process a large text file in chunks. Therefore, I want the script to print out on the screen how many lines have been processed. However, I don't see any print out at all.

I've attached a toy code. Each foo() call should tell me what process is being used. In my actual code, I will call foo() on each chunk, and it will tell me how many lines of text in that chunk I've processed.

import os
from multiprocessing import Pool

def foo(x,y):
    print(f'Process: {os.getpid()}')
    return(x*y)

def bar(x):
    p = Pool()
    result_list = []
    for i in range(30):
        p.apply_async(foo, args=(i,i*x), callback=result_list.append)
    p.close()
    p.join()
    return(result_list)

if __name__ == '__main__':
    print(bar(2))

I got a print out of the multiplication x*y result, but I didn't see any print out that tells me the process id.

Can anyone help me please?

like image 381
Hoang Van Phan Avatar asked Nov 06 '22 16:11

Hoang Van Phan


1 Answers

Your sys.stdout is likely block buffered, which means a small number of prints can get buffered without filling the buffer (and therefore the buffer is never flushed to the screen/file). Normally, Python flushes the buffers on exit so this isn't an issue.

Problem is, to avoid a bunch of tricky issues with doubled-cleanup, when using multiprocessing, the workers exit using os._exit, which bypasses all cleanup procedures (including flushing stdio buffers). If you want to be sure the output is emitted, tell print to flush the output immediately by changing:

print(f'Process: {os.getpid()}')

to:

print(f'Process: {os.getpid()}', flush=True)
like image 163
ShadowRanger Avatar answered Nov 15 '22 11:11

ShadowRanger