Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

multiprocessing pool.map not processing list in order

I have this script to process some urls in parallel:

import multiprocessing
import time

list_of_urls = []

for i in range(1,1000):
    list_of_urls.append('http://example.com/page=' + str(i))

def process_url(url):
    page_processed = url.split('=')[1]
    print 'Processing page %s'% page_processed
    time.sleep(5)

pool = multiprocessing.Pool(processes=4)
pool.map(process_url, list_of_urls)

The list is ordered, but when I run it, the script doesn't pick urls from list in order:

Processing page 1
Processing page 64
Processing page 127
Processing page 190
Processing page 65
Processing page 2
Processing page 128
Processing page 191

Instead, I would like it to process page 1,2,3,4 at first, then continue following the order in the list. Is there an option to do this?

like image 371
Hyperion Avatar asked Nov 18 '16 18:11

Hyperion


1 Answers

If you do not pass argument chunksize, then map will calculate chunks using this algorithm:

chunksize, extra = divmod(len(iterable), len(self._pool) * 4)
if extra:
   chunksize += 1

It's cutting your iterable into task_batches and running it on separate processes. That is why it's not in order. The solution is to declare the chunk size equal to 1.

import multiprocessing
import time

list_test = range(10)

def process(task):
    print "task:", task
    time.sleep(1)

pool = multiprocessing.Pool(processes=3)
pool.map(process, list_test, chunksize=1)

task: 0
task: 1
task: 2
task: 3
task: 4
task: 5
task: 6
task: 7
task: 8
task: 9
like image 164
grzgrzgrz3 Avatar answered Oct 11 '22 20:10

grzgrzgrz3