Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Some 502 errors in GCP HTTP Load Balancing

Tags:

Our load balancer is returning 502 errors for some requests. It is just a very low percentage of the total requests, we have around 36000 request per hour and about 40 errors per hour, so just a 0,01% of the requests returns an error.

The instances are healthy when the error occurs and we have added this forwarding rule to the firewall for the load balancer: 130.211.0.0/22 tcp:1-5000 Apply to all targets

It is not a very serious problem because the application tolerates such errors, but I would like to know why they are given.

Any help will be apreciated.

like image 529
Jordi Avatar asked Dec 23 '16 16:12

Jordi


People also ask

What are the three categories of GCP load balancing?

HTTP and HTTPS traffic:Global external HTTP(S) load balancer. Global external HTTP(S) load balancer (classic) Regional external HTTP(S) load balancer.

What is HTTP load balancer in GCP?

External HTTP(S) Load Balancing is a proxy-based Layer 7 load balancer that enables you to run and scale your services behind a single external IP address.

When should I use load balancer in GCP?

Load balancers are managed services on GCP that distribute traffic across multiple instances of your application. GCP bears the burden of managing operational overhead and reduces the risk of having a non-functional, slow, or overburdened application.

What is a serverless neg?

A serverless NEG is a backend that points to a Cloud Run, App Engine, Cloud Functions, or API Gateway service. A serverless NEG can represent one of the following: A Cloud Run service or a group of services. A Cloud Functions function or a group of functions.


2 Answers

It seems that there are no an easy solution for this.

As Mike Fotinakis explains in this blog (thank you for this info JasonG :)):

It turns out that there is a race condition between the Google Cloud HTTP(S) Load Balancer and NGINX’s default keep-alive timeout of 65 seconds. The NGINX timeout might be reached at the same time the load balancer tries to re-use the connection for another HTTP request, which breaks the connection and results in a 502 Bad Gateway response from the load balancer.

In my case I'm using Apache with the mpm_prefork module. The solution proposed is to increase the connection keepalive timeout to 650s, but this is not possible because each connection opens one new process (so this would represent a great waste of resources).

UPDATE:
It seems that there are some new documentation about this problem on the official load balancer documentation page (search for "Timeouts and retries"): https://cloud.google.com/compute/docs/load-balancing/http/

They recommend to set the KeepAliveTimeout value to 620 in both cases (Apache and Nginx).

like image 189
Jordi Avatar answered Oct 12 '22 05:10

Jordi


I had an issue w/ 502s that was unexplainable after recreating a load balancer and backend config. I recreated my backend & instance group for unmanaged instances and this seemed to fix the issue for me. I wasn't able to identify any issues in my configuration in GCP :(

But I had a lot more errors - 1/10. There are load balancer logs that will tell you what the cause is and docs explain the causes.

Eg mine were: jsonPayload: { statusDetails: "failed_to_pick_backend" @type: "type.googleapis.com/google.cloud.loadbalancing.type.LoadBal‌​ancerLogEntry" }

If you're using nginx and it's on POSTS and the error is reported as "backend_connection_closed_before_data_sent_to_client" it may be fixed by changing your nginx timeouts. See this excellent blog post:

https://blog.percy.io/tuning-nginx-behind-google-cloud-platform-http-s-load-balancer-305982ddb340#.btzyusgi6

like image 29
JasonG Avatar answered Oct 12 '22 07:10

JasonG