I have a gRPC server that works fine on my local machine. I can send grpc requests from a python app and get back the right responses.
I put the server into a GKE cluster (with only one node). I had a normal TCP load balancer in front of the cluster. In this setup my local client was able to get the correct response from some requests, but not others. I think it was the gRPC streaming that didn't work.
I assumed that this is because the streaming requires an HTTP/2 connection which requires SSL.
The standard load balancer I got in GKE didn't seem to support SSL, so I followed the docs to set up an ingress load balancer which does. I'm using a Lets-Encrypt certificate with it.
Now all gRPC requests return
status = StatusCode.UNAVAILABLE
details = "Socket closed"
debug_error_string = "{"created":"@1556172211.931158414","description":"Error received from peer ipv4:ip.of.ingress.service:443", "file":"src/core/lib/surface/call.cc", "file_line":1041,"grpc_message":"Socket closed","grpc_status":14}"
The IP address is the external IP address of my ingress service. The ingress yaml looks like this:
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: rev79-ingress
annotations:
kubernetes.io/ingress.global-static-ip-name: "rev79-ip"
ingress.gcp.kubernetes.io/pre-shared-cert: "lets-encrypt-rev79"
kubernetes.io/ingress.allow-http: "false" # disable HTTP
spec:
rules:
- host: sub-domain.domain.app
http:
paths:
- path: /*
backend:
serviceName: sandbox-nodes
servicePort: 60000
The subdomain and domain of the request from my python app match the host in the ingress rule.
It connects to a node-port that looks like this:
apiVersion: v1
kind: Service
metadata:
name: sandbox-nodes
spec:
type: NodePort
selector:
app: rev79
environment: sandbox
ports:
- protocol: TCP
port: 60000
targetPort: 9000
The node itself has two containers and looks like this:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: rev79-sandbox
labels:
app: rev79
environment: sandbox
spec:
replicas: 1
template:
metadata:
labels:
app: rev79
environment: sandbox
spec:
containers:
- name: esp
image: gcr.io/endpoints-release/endpoints-runtime:1.31
args: [
"--http2_port=9000",
"--service=rev79.endpoints.rev79-232812.cloud.goog",
"--rollout_strategy=managed",
"--backend=grpc://0.0.0.0:3011"
]
ports:
- containerPort: 9000
- name: rev79-uac-sandbox
image: gcr.io/rev79-232812/uac:latest
imagePullPolicy: Always
ports:
- containerPort: 3011
env:
- name: RAILS_MASTER_KEY
valueFrom:
secretKeyRef:
name: rev79-secrets
key: rails-master-key
The target of the node port is the ESP container which connects to the gRPC service deployed in the cloud, and the backend which is a Rails app that implements the backend of the API. This rails app isn't running the rails server, but a specialised gRPC server that comes with the grpc_for_rails
gem
The grpc_server in the Rails app doesn't record any action in the logs, so I don't think the request gets that far.
kubectl get ingress
reports this:
NAME HOSTS ADDRESS PORTS AGE
rev79-ingress sub-domain.domain.app my.static.ip.addr 80 7h
showing port 80, even though it's set up with SSL. That seems to be a bug. When I check with curl -kv https://sub-domain.domain.app
the ingress server handles the request fine, and uses HTTP/2. It reurns an HTML formatted server error, but I'm not sure what generates that.
The API requires an API key, which the python client inserts into the metadata of each request.
When I go to the endpoints page of my GCP console I see that the API is not registering any requests since putting in the ingress loadbalancer, so it looks like the requests are not reaching the EPS container.
So why am I getting "socket closed" errors with gRPC?
I said I would come back and post an answer here once I got it working. It looks like I never did. Being a man of my word I'll post now my config files which are working for me.
in my deployment I've put a liveness and readiness probe for the ESP container. This made deployments happen smoothly without downtime:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: rev79-sandbox
labels:
app: rev79
environment: sandbox
spec:
replicas: 3
template:
metadata:
labels:
app: rev79
environment: sandbox
spec:
volumes:
- name: nginx-ssl
secret:
secretName: nginx-ssl
- name: gcs-creds
secret:
secretName: rev79-secrets
items:
- key: gcs-credentials
path: "gcs.json"
containers:
- name: esp
image: gcr.io/endpoints-release/endpoints-runtime:1.45
args: [
"--http_port", "8080",
"--ssl_port", "443",
"--service", "rev79-sandbox.endpoints.rev79-232812.cloud.goog",
"--rollout_strategy", "managed",
"--backend", "grpc://0.0.0.0:3011",
"--cors_preset", "cors_with_regex",
"--cors_allow_origin_regex", ".*",
"-z", " "
]
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 60
timeoutSeconds: 5
periodSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /healthz
port: 8080
timeoutSeconds: 5
failureThreshold: 1
volumeMounts:
- name: nginx-ssl
mountPath: /etc/nginx/ssl
readOnly: true
ports:
- containerPort: 8080
- containerPort: 443
protocol: TCP
- name: rev79-uac-sandbox
image: gcr.io/rev79-232812/uac:29eff5e
imagePullPolicy: Always
volumeMounts:
- name: gcs-creds
mountPath: "/app/creds"
ports:
- containerPort: 3011
name: end-grpc
- containerPort: 3000
env:
- name: RAILS_MASTER_KEY
valueFrom:
secretKeyRef:
name: rev79-secrets
key: rails-master-key
This is my service config that exposes the deployment to the load balancer:
apiVersion: v1
kind: Service
metadata:
name: rev79-srv-ingress-sandbox
labels:
type: rev79-srv
annotations:
service.alpha.kubernetes.io/app-protocols: '{"rev79":"HTTP2"}'
cloud.google.com/neg: '{"ingress": true}'
spec:
type: NodePort
ports:
- name: rev79
port: 443
protocol: TCP
targetPort: 443
selector:
app: rev79
environment: sandbox
And this is my ingress:
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: rev79-ingress
annotations:
kubernetes.io/ingress.global-static-ip-name: "rev79-global-ip"
spec:
tls:
- secretName: sandbox-api-rev79-app-tls
rules:
- host: sandbox-api.rev79.app
http:
paths:
- backend:
serviceName: rev79-srv-ingress-sandbox
servicePort: 443
I'm using cert-manager to manage the certificates.
It was a long time agao now. I can't remember if there was anything else I did to solve the issue I was having
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With