I have this script to process some urls in parallel:
import multiprocessing
import time
list_of_urls = []
for i in range(1,1000):
list_of_urls.append('http://example.com/page=' + str(i))
def process_url(url):
page_processed = url.split('=')[1]
print 'Processing page %s'% page_processed
time.sleep(5)
pool = multiprocessing.Pool(processes=4)
pool.map(process_url, list_of_urls)
The list is ordered, but when I run it, the script doesn't pick urls from list in order:
Processing page 1
Processing page 64
Processing page 127
Processing page 190
Processing page 65
Processing page 2
Processing page 128
Processing page 191
Instead, I would like it to process page 1,2,3,4 at first, then continue following the order in the list. Is there an option to do this?
If you do not pass argument chunksize
, then map
will calculate chunks using this algorithm:
chunksize, extra = divmod(len(iterable), len(self._pool) * 4)
if extra:
chunksize += 1
It's cutting your iterable into task_batches and running it on separate processes. That is why it's not in order. The solution is to declare the chunk size equal to 1.
import multiprocessing
import time
list_test = range(10)
def process(task):
print "task:", task
time.sleep(1)
pool = multiprocessing.Pool(processes=3)
pool.map(process, list_test, chunksize=1)
task: 0
task: 1
task: 2
task: 3
task: 4
task: 5
task: 6
task: 7
task: 8
task: 9
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With