I'm trying to generate a large PDF using a Flask application. The pdf generation involves generating ten long pdfs, and then merging them together. The application runs using Gunicorn with the flags: --worker-class gevent --workers 2.
Here's what my server-side code looks like:
@app.route ('/pdf/create', methods=['POST', 'GET'])
def create_pdf():
def generate():
for section in pdfs:
yield "data: Generating %s pdf\n\n" % section
# Generate pdf with pisa (takes up to 2 minutes)
yield "data: Merging PDFs\n\n"
# Merge pdfs (takes up to 2 minutes)
yield "data: /user/pdf_filename.pdf\n\n"
return Response(stream_with_context(generate()), mimetype='text/event-stream')
The client side code looks like:
var source = new EventSource(create_pdf_url);
source.onopen = function (event) {
console.log("Creating PDF")
}
source.onmessage = function (event) {
console.log(event.data);
}
source.onerror = function (event) {
console.log("ERROR");
}
When I run without GUnicorn, I get provided with steady, real-time updates from the console log. They look like:
Creating PDF
Generating section one
Generating section two
Generating section three
...
Generating section ten
Merging PDFS
/user/pdf_filename.pdf
When I run this code with Gunicorn, I don't get regular updates. The worker runs until Gunicorn's timeout kills it, then I get a dump of all the messages that should've happened, followed by a final error
Creating PDF
Generating section one
Generating section two
ERROR
The Gunicorn log looks like:
[2015-03-19 21:57:27 +0000] [3163] [CRITICAL] WORKER TIMEOUT (pid:3174)
How can I keep Gunicorn from killing the process? I don't think setting a super-large timeout is a good idea. Perhaps there's something in gunicorn's worker classes that I can use to make sure the process is handled correctly?
Gunicorn should only need 4-12 worker processes to handle hundreds or thousands of requests per second.
Gunicorn also allows for each of the workers to have multiple threads. In this case, the Python application is loaded once per worker, and each of the threads spawned by the same worker shares the same memory space.
I ended up solving the problem using Celery.
I used this example to guide me in setting up Celery.
Then I used Grinberg's Celery tutorial to stream real-time updates to the user's browser.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With