Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python 3 queue produced by generator, consumed by multiprocesssing

I have a generator that will generate more than 1 trillion strings, and I would like to put them in a queue, and let a pool of workers to consume the queue. However I couldn't afford to put the whole 1 trillion strings in my memory and map them to threads.

Generator is very fast, consumption worker is not. I need to maintain the length of queue at a certain level to not blow up my memory. That means I need to figure out a way to pause and restart feeding the queue.

Could anyone provide a hint or so how to accomplish this task in Python 3.4?

like image 811
JimmyK Avatar asked Aug 29 '15 05:08

JimmyK


1 Answers

You can specify the maximum size of the queue:

q = queue.Queue(10)   # max size of the queue is 10

When the queue has attained the maximum size new insertions will block until items are removed from the queue.

Your generator thread can just generate items and put them on the queue. If it gets too far ahead of the consumer threads it will just block.

while not done:
   e = generate next item
   q.put(e)         # will block if queue is full

See:

https://docs.python.org/3/library/queue.html

for more info.

like image 137
ErikR Avatar answered Nov 14 '22 19:11

ErikR