Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elastic Beanstalk Worker's SQS daemon getting 504 gateway timeout after 1 minute

I have an Elastic Beanstalk worker that can only run one task at a time and it takes some time to do so (from a few minutes to, hopefully, less than 30 minutes), so I'm queuing my tasks on a SQS.

On my worker configuration, I have:

HTTP connections: 1
Visibility timeout: 3600
Error visibility timeout: 300

(On "Advanced")
Inactivity timeout: 1800

The problem is that there seems to be a 1 minute timeout (on nginx?) that overrides the "Inactivity timeout", returning a 504 (Gateway timeout).

This is what I can find on the aws-sqsd.log file:

2016-02-03T16:16:27Z init: initializing aws-sqsd 2.0 (2015-02-18)
2016-02-03T16:16:27Z start: polling https://sqs.eu-central-1.amazonaws.com/855381918026/jitt-publisher-queue
2016-02-03T16:23:36Z message: sent to %[http://localhost:80]
2016-02-03T16:24:36Z http-err: 1444d1ba-ecb5-46f8-82d6-d0bf19b91fad (1) 504 - 60.006
2016-02-03T16:28:54Z message: sent to %[http://localhost:80]
2016-02-03T16:29:54Z http-err: 1b7514d3-689a-4e8b-a569-5ef1ac32ed0c (1) 504 - 60.029
2016-02-03T16:29:54Z message: sent to %[http://localhost:80]
2016-02-03T16:29:54Z http-err: 1444d1ba-ecb5-46f8-82d6-d0bf19b91fad (2) 500 - 0.006
2016-02-03T16:33:49Z message: sent to %[http://localhost:80]
2016-02-03T16:34:49Z http-err: 3a43e80f-a8d3-46b2-b2a0-9d898ad4f2a6 (1) 504 - 60.023
2016-02-03T16:34:54Z message: sent to %[http://localhost:80]
2016-02-03T16:34:54Z http-err: 1b7514d3-689a-4e8b-a569-5ef1ac32ed0c (2) 500 - 0.004
2016-02-03T16:34:54Z message: sent to %[http://localhost:80]
2016-02-03T16:34:54Z http-err: 1444d1ba-ecb5-46f8-82d6-d0bf19b91fad (3) 500 - 0.003
2016-02-03T16:39:49Z message: sent to %[http://localhost:80]
2016-02-03T16:40:49Z http-err: 3a43e80f-a8d3-46b2-b2a0-9d898ad4f2a6 (2) 504 - 60.019

Some things make sense here, like the 5 minute delay that each message takes from the time of the 504/500 until the task is re-sent to the worker once again (which matches the 300 seconds configuration for the "Error visibility timeout").

Those 500 codes match my current logic: the worker rejects the task by throwing a 500 back if there's still something running.

I have seen a lot of answers talking about setting the Load Balancer connection timeout setting, but, since this is a worker pulling messages from a SQS queue, there is no Load Balancer.

Any idea on what I should do to override that 1 minute timeout setting?

like image 694
Tiago Fael Matos Avatar asked Feb 04 '16 04:02

Tiago Fael Matos


1 Answers

Since I had the time to investigate this a little better, the solution is to add an ebextension that configures the proxy timeout settings:

files:
    "/etc/nginx/sites-available/elasticbeanstalk-nginx-docker-proxy-timeout.conf":
        mode: "000644"
        owner: root
        group: root
        content: |
            proxy_connect_timeout       3600;
            proxy_send_timeout          3600;
            proxy_read_timeout          3600;
            send_timeout                3600;
commands:
    "00nginx-create-proxy-timeout":
        command: "if [[ ! -h /etc/nginx/sites-enabled/elasticbeanstalk-nginx-docker-proxy-timeout.conf ]] ; then ln -s /etc/nginx/sites-available/elasticbeanstalk-nginx-docker-proxy-timeout.conf /etc/nginx/sites-enabled/elasticbeanstalk-nginx-docker-proxy-timeout.conf ; fi"

Source: http://cloudavail.com/2015/10/18/allowing-long-idle-timeouts-when-using-aws-elasticbeanstalk-and-docker/

like image 139
Tiago Fael Matos Avatar answered Nov 04 '22 15:11

Tiago Fael Matos