Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Health Checks in GKE in GCloud resets after I change it from HTTP to TCP

I'm working on a Kubernetes cluster where I am directing service from GCloud Ingress to my Services. One of the services endpoints fails health check as HTTP but passes it as TCP.

When I change the health check options inside GCloud to be TCP, the health checks pass, and my endpoint works, but after a few minutes, the health check on GCloud resets for that port back to HTTP and health checks fail again, giving me a 502 response on my endpoint.

I don't know if it's a bug inside Google Cloud or something I'm doing wrong in Kubernetes. I have pasted my YAML configuration here:

namespace

apiVersion: v1
kind: Namespace
metadata:
  name: parity
  labels:
    name: parity

storageclass

apiVersion: storage.k8s.io/v1
metadata:
  name: classic-ssd
  namespace: parity
provisioner: kubernetes.io/gce-pd
parameters:
  type: pd-ssd
  zones: us-central1-a
reclaimPolicy: Retain

secret

apiVersion: v1
kind: Secret
metadata:
    name: tls-secret 
    namespace: ingress-nginx 
data:
    tls.crt: ./config/redacted.crt
    tls.key: ./config/redacted.key

statefulset

apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
  name: parity
  namespace: parity
  labels:
    app: parity
spec:
  replicas: 3 
  selector:
    matchLabels:
      app: parity
  serviceName: parity
  template:
    metadata:
      name: parity
      labels:
        app: parity
    spec:
      containers:
        - name: parity
          image: "etccoop/parity:latest"
          imagePullPolicy: Always
          args:
          - "--chain=classic"
          - "--jsonrpc-port=8545"
          - "--jsonrpc-interface=0.0.0.0"
          - "--jsonrpc-apis=web3,eth,net"
          - "--jsonrpc-hosts=all"
          ports:
            - containerPort: 8545
              protocol: TCP
              name: rpc-port
            - containerPort: 443
              protocol: TCP
              name: https
          readinessProbe:
            tcpSocket:
              port: 8545
            initialDelaySeconds: 650
          livenessProbe:
            tcpSocket:
              port: 8545
            initialDelaySeconds: 650
          volumeMounts:
            - name: parity-config
              mountPath: /parity-config
              readOnly: true
            - name: parity-data
              mountPath: /parity-data
      volumes:
      - name: parity-config
        secret:
          secretName: parity-config
  volumeClaimTemplates:
    - metadata:
        name: parity-data
      spec:
        accessModes: ["ReadWriteOnce"]
        storageClassName: "classic-ssd"
        resources:
          requests:
            storage: 50Gi

service

apiVersion: v1
kind: Service
metadata:
  labels:
    app: parity
  name: parity
  namespace: parity
  annotations:
    cloud.google.com/app-protocols: '{"my-https-port":"HTTPS","my-http-port":"HTTP"}'
spec:
  selector:
    app: parity
  ports:
  - name: default
    protocol: TCP
    port: 80
    targetPort: 80
  - name: rpc-endpoint
    port: 8545
    protocol: TCP
    targetPort: 8545
  - name: https
    port: 443
    protocol: TCP
    targetPort: 443
  type: LoadBalancer

ingress

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
    name: ingress-parity
    namespace: parity
    annotations:
        #nginx.ingress.kubernetes.io/rewrite-target: /
        kubernetes.io/ingress.global-static-ip-name: cluster-1
spec:
    tls:
      secretName: tls-classic
      hosts:
        - www.redacted.com
    rules:
    - host: www.redacted.com
      http:
        paths:
        - path: /
          backend:
            serviceName: web
            servicePort: 8080
        - path: /rpc
          backend:
            serviceName: parity 
            servicePort: 8545

Issue

I've redacted hostnames and such, but this is my basic configuration. I've also run a hello-app container from this documentation here for debugging: https://cloud.google.com/kubernetes-engine/docs/tutorials/hello-app

Which is what the endpoint for ingress on / points to on port 8080 for the hello-app service. That works fine and isn't the issue, but just mentioned here for clarification.

So, the issue here is that, after creating my cluster with GKE and my ingress LoadBalancer on Google Cloud (the cluster-1 global static ip name in the Ingress file), and then creating the Kubernetes configuration in the files above, the Health-Check fails for the /rpc endpoint on Google Cloud when I go to Google Compute Engine -> Health Check -> Specific Health-Check for the /rpc endpoint.

When I edit that Health-Check to not use HTTP Protocol and instead use TCP Protocol, health-checks pass for the /rpc endpoint and I can curl it just fine after and it returns me the correct response.

The issue is that a few minutes after that, the same Health-Check goes back to HTTP protocol even though I edited it to be TCP, and then the health-checks fail and I get a 502 response when I curl it again.

I am not sure if there's a way to attach the Google Cloud Health Check configuration to my Kubernetes Ingress prior to creating the Ingress in kubernetes. Also not sure why it's being reset, can't tell if it's a bug on Google Cloud or something I'm doing wrong in Kubernetes. If you notice on my statefulset deployment, I have specified livenessProbe and readinessProbe to use TCP to check the port 8545.

The delay of 650 seconds was due to this ticket issue here which was solved by increasing the delay to greater than 600 seconds (to avoid mentioned race conditions): https://github.com/kubernetes/ingress-gce/issues/34

I really am not sure why the Google Cloud health-check is resetting back to HTTP after I've specified it to be TCP. Any help would be appreciated.

like image 428
Yazanator Avatar asked Nov 07 '22 19:11

Yazanator


1 Answers

I found a solution where I added a new container for health check on my stateful set on /healthz endpoint, and configured the health check of the ingress to check that endpoint on the 8080 port assigned by kubernetes as an HTTP type of health-check, which made it work.

It's not immediately obvious why the reset happens when it's TCP.

like image 106
Yazanator Avatar answered Nov 15 '22 09:11

Yazanator