Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GKE + WebSocket + NodePort 30s dropped connections

I have a golang service that implements a WebSocket client using gorilla that is exposed to a Google Container Engine (GKE)/k8s cluster via a NodePort (30002 in this case).

I've got a manually created load balancer (i.e. NOT at k8s ingress/load balancer) with HTTP/HTTPS frontends (i.e. 80/443) that forward traffic to nodes in my GKE/k8s cluster on port 30002.

I can get my JavaScript WebSocket implementation in the browser (Chrome 58.0.3029.110 on OSX) to connect, upgrade and send / receive messages.

I log ping/pongs in the golang WebSocket client and all looks good until 30s in. 30s after connection my golang WebSocket client gets an EOF / close 1006 (abnormal closure) and my JavaScript code gets a close event. As far as I can tell, neither my Golang or JavaScript code is initiating the WebSocket closure.

I don't particularly care about session affinity in this case AFAIK, but I have tried both IP and cookie based affinity in the load balancer with long lived cookies.

Additionally, this exact same set of k8s deployment/pod/service specs and golang service code works great on my KOPS based k8s cluster on AWS through AWS' ELBs.

Any ideas where the 30s forced closures might be coming from? Could that be a k8s default cluster setting specific to GKE or something on the GCE load balancer?

Thanks for reading!

-- UPDATE --

There is a backend configuration timeout setting on the load balancer which is for "How long to wait for the backend service to respond before considering it a failed request".

The WebSocket is not unresponsive. It is sending ping/pong and other messages right up until getting killed which I can verify by console.log's in the browser and logs in the golang service.

That said, if I bump the load balancer backend timeout setting to 30000 seconds, things "work".

Doesn't feel like a real fix though because the load balancer will continue to feed actual unresponsive services traffic inappropriately, never mind if the WebSocket does become unresponsive.

I've isolated the high timeout setting to a specific backend setting using a path map, but hoping to come up with a real fix to the problem.

like image 987
Jeremy Gordon Avatar asked May 17 '17 19:05

Jeremy Gordon


1 Answers

I think this may be Working as Intended. Google just updated the documentation today (about an hour ago).

LB Proxy Support docs

Backend Service Components docs

Cheers,

Matt

like image 91
brugz Avatar answered Oct 26 '22 10:10

brugz