Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

504 Gateway Timeout - Two EC2 instances with load balancer

This might be the impossible issue. I've tried everything. I feel like there's a guy at a switchboard somewhere, twirling his mustache.

The problem:

I have Amazon EC2 running an application. It functions without issue when there is only one instance and no load balancer.

But in my production environment I have two identical instances running behind one load-balancer and when performing certain tasks, like a feature that generates a PDF and attaches it to an email, nothing happens at all, and when using Google Developer tools with the Network tab I get the error "504 Gateway Timeout" once the timeout hits (I have it set at 30 seconds).

My Database is external, on Amazon RDS.

I think.... If I could force a client to stay connected to their initial server they logged in at, this problem would be solved, because it's my understanding that the 504 Gateway Timeout is happening when instance-1 tries to reach out to instance-2 to perform the task.

This happens ONLY WHEN using Load Balancing, but never when connecting straight to one of my two servers.

Load Balancer Settings:

  • The load balancer has a CRECORD on my Registrar so that app.myapplication.com points to myloadbalancerDNSname.elb.amazonaws.com
  • The load balancer has 2 healthy instances, each in the same region but they are in different availability zones.
  • The load balancer is using the same Security Groups as the Instances (allow ALL IPs on ports 22, 80, and 443)
  • The load balancer has cross-zone load balancing turned on.
  • CORS (in Amazon S3) is enabled to GET, POST, PUT, DELETE from * to * (I have no idea how this is associated with my instances but anyway I did it as the instructions said)
  • The load balancer has listeners configured as such:
    • Load Balancer Protocol:HTTP Load Balancer Port:80 Instance Protocol:HTTP Instance Port:80
    • Load Balancer Protocol:HTTPS Load Balancer Port:443 Instance Protocol:HTTP Instance Port:80 (cipher chosen correctly per my Cert provider, and SSL fields 100% surely correct)

Some more ideas:

That being said, I'm not testing with HTTPS, but normal HTTP instead. I'm not convinced SSL is setup properly even though my certificate provider said it is. The reason I'm suspicious is that when I try to key in https://app.myapplication.com I get the error "(failed) net::ERR_CONNECTION_CLOSED" in Google Developer Tools, in the Network tab. But this should be non-applicable because I'm having the problem even using regular HTTP. I can troubleshoot SSL later.

So to reiterate, my problem is having the "504 Gateway Timeout" problem when using some functions, but also occasionally at random instead of loading the page (but rarely). This 504 problem happens ONLY WHEN using Load Balancing, but never when connecting straight to one of my two instances.

I don't know which question to ask, because I've Followed every document to the T, double and triple checked all suggestions all over the web and NOTHING.

like image 816
user3035649 Avatar asked Oct 24 '14 06:10

user3035649


People also ask

How do I troubleshoot 504 errors returned while using a application load balancer?

To resolve this, enable keep-alive settings on your backend instances, and set the keep-alive timeout to a value greater than the load balancer's idle timeout.

How do I troubleshoot 503 errors returned while using application load balancer?

Open the Amazon EC2 console. On the navigation pane, under Auto Scaling, choose Auto Scaling Groups. Choose the Auto Scaling group that you want to verify. Under Load balancing, confirm that the Target Group of the Application Load Balancer is associated with the Auto Scaling Group.

How do I troubleshoot AWS load balancer?

If the load balancer is not responding to requests, check for the following issues: Your internet-facing load balancer is attached to a private subnet. You must specify public subnets for your load balancer. A public subnet has a route to the Internet Gateway for your virtual private cloud (VPC).


2 Answers

What web server are you using? I had a very similar issue with nginx and AWS load balancing. I added keepalive_timeout 75s; to the http block in my nginx config file and haven't see the issue since.

Make sure you restart nginx after you add and save that line (on ubuntu sudo service nginx restart. On redhat stop nginx /path/to/nginx/executable -s stop then /path/to/nginx/executable to start up nginx)

This fix was recommended by AWS on their help page AWS Load balancer troubleshooting

like image 84
Maximus Avatar answered Oct 26 '22 19:10

Maximus


First, what is the Idle Timeout for your ELB set to? You'll find it at the very bottom of the "Description" tab for your load balancer. You can read more about the idle timeout here in the ELB documentation. The default is 60 seconds. You should also consider setting or increasing Keep-alive in your web server. How you do that will depend on what web server you are using.

Second, if you think it's due to the client being switched from one instance to the other then you should enable session stickiness in the ELB. This will ensure that a client is always directed to the same back-end instance by the load balancer. To enable this, again go to the "Description" tab then click on the Edit link next to each entry in the Port Configuration section. You'll likely want to choose the "Enable Load Balancer Generated Cookie Stickiness" option since that will tell the ELB to manage all aspects of the stickiness.

like image 8
Bruce P Avatar answered Oct 26 '22 20:10

Bruce P