I am using Gunicorn to serve my flask webapp. My web app sends requests to download huge files some more than 10GB, which takes a while to complete. I am streaming the output of the progress back to the webpage using a generator, so the connection is left open until the download is done. My problem is Gunicorn will timeout after a certain amount of seconds. I configured the timeout to be longer like this: <pre class="prettyprint"><code>/usr/bin/gunicorn -c /my/dir/to/app/gunicorn.conf -b 0.0.0.0:5000 wsgi --timeout 90 </code></pre> but I don't know how long it will take, so I have to keep changing this timeout if the downloaded file gets larger and larger. I was wondering if there is a way to disable the timeout all together, or if there is another option to remedy long download times.

The timeout setting you specify with Gunicorn is to basically release a connection and restart a worker. Gunicorn kills such idle workers and restarts them. [1] If you are streaming back the response, then IMO, your worker shouldn't get knocked off and killed by the parent process. Note, a connection is idle if no data is sent or received by a host. So now here is what you might want to try. These are my personal suggestions. <ul> <li>Use <code>--threads</code> settings, and set it up to a greater than 1 value; in this way, your worker may not be sitting idle and could be serving other requests. [2]</li> <li>Instead of specifying a timeout here, you could try providing a timeout in the request's header. For this, you need to understand the <code>Keep-Alive</code> header. The Keep-Alive has a <code>timeout</code> parameter. [3] and [4]</li> <li>Use <code>multi-part</code> download to speed up your downloading of the large file. For this, you need to break-down the download into chunks, and then you could issue parallel requests for downloading that large file. [5]</li> <li>Since your objective seems to be that you want to stream back the progress of your download back on a webpage, so instead of keeping the connection alive and open, using polling technique to fetch the progress of your download. Poll after every, say, 250-400 ms to get an update. In this way, your system would be more robust on slow-network connections also, and scalable for arbitrary large files. The caveat is that you need to somehow maintain the information of how much file has been downloaded. I personally built a multi-part download manager in Scala, using Actor framework. </li> <li>One more suggestion is that you might also want to try a library like Flask-SocketIO. And although this is not really a bidirectional communication, but the point is to ensure that the socket remains open to give back the progress update. [6]</li> </ul>

Gunicorn to disable timeout

Tags:

python-3.x

gunicorn

I am using Gunicorn to serve my flask webapp. My web app sends requests to download huge files some more than 10GB, which takes a while to complete. I am streaming the output of the progress back to the webpage using a generator, so the connection is left open until the download is done. My problem is Gunicorn will timeout after a certain amount of seconds.

I configured the timeout to be longer like this:

/usr/bin/gunicorn -c /my/dir/to/app/gunicorn.conf -b 0.0.0.0:5000 wsgi --timeout 90

but I don't know how long it will take, so I have to keep changing this timeout if the downloaded file gets larger and larger.

I was wondering if there is a way to disable the timeout all together, or if there is another option to remedy long download times.

425

asked Oct 11 '18 21:10

lion_bash

1 Answers

The timeout setting you specify with Gunicorn is to basically release a connection and restart a worker. Gunicorn kills such idle workers and restarts them. [1]

If you are streaming back the response, then IMO, your worker shouldn't get knocked off and killed by the parent process. Note, a connection is idle if no data is sent or received by a host.

So now here is what you might want to try. These are my personal suggestions.

Use --threads settings, and set it up to a greater than 1 value; in this way, your worker may not be sitting idle and could be serving other requests. [2]
Instead of specifying a timeout here, you could try providing a timeout in the request's header. For this, you need to understand the Keep-Alive header. The Keep-Alive has a timeout parameter. [3] and [4]
Use multi-part download to speed up your downloading of the large file. For this, you need to break-down the download into chunks, and then you could issue parallel requests for downloading that large file. [5]
Since your objective seems to be that you want to stream back the progress of your download back on a webpage, so instead of keeping the connection alive and open, using polling technique to fetch the progress of your download. Poll after every, say, 250-400 ms to get an update. In this way, your system would be more robust on slow-network connections also, and scalable for arbitrary large files. The caveat is that you need to somehow maintain the information of how much file has been downloaded. I personally built a multi-part download manager in Scala, using Actor framework.
One more suggestion is that you might also want to try a library like Flask-SocketIO. And although this is not really a bidirectional communication, but the point is to ensure that the socket remains open to give back the progress update. [6]

153

answered Sep 19 '22 03:09

Archit Kapoor

Related questions
                            
                                infer_datetime_format with parse_date taking more time
                            
                                Works with urrlib.request but doesn't work with requests
                            
                                Get Instagram followers list with python script
                            
                                How to topological sort a sub/nested graph?
                            
                                why networkx.draw() produces nothing? [duplicate]
                            
                                Where should virtualenvs go in production?
                            
                                Why there's the difference between creating class in python 2.7 and python 3.4 performance
                            
                                Subclassing file by subclassing `io.TextIOWrapper` — but what signature does its constructor have?
                            
                                Prevent access to an instance variable from subclass, without affecting base class
                            
                                Pympler summary doesn't seem to make sense
                            
                                Python module import works for one file, fails for another
                            
                                Redshift + SQLAlchemy long query hangs
                            
                                Python: How to generate all combinations of lists of tuples without repeating contents of the tuple
                            
                                os.path.abspath vs os.path.dirname
                            
                                How do I distribute my pip package with data files correctly?
                            
                                Get length of a dataset in Tensorflow
                            
                                How to convert all layers of a pretrained Keras model to a different dtype (from float32 to float16)?
                            
                                Can you use loc to select a range of columns plus a column outside of the range?
                            
                                Can't go on to the next page using post request
                            
                                ClobberError while installing virtual environment for conda

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With