Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Kubernetes Traefik internal server error on every other request

So I’m using traefik 2.2, I run a bare metal kubernetes cluster with a single node master. I don’t have a physical or virtual load balancer so the traefik pod takes in all requests on ports 80 and 443. I have an example wordpress installed with helm. As you can see here exactly every other request is a 500 error. http://wp-example.cryptexlabs.com/feed/. I can confirm that the request that is a 500 error never reaches the wordpress container so I know this has something to do with traefik. In the traefik logs it just shows there was a 500 error. So I have 1 pod in the traefik namespace, a service in the default service, an external name service in the default namespace that points to the example wordpress site which a wp-example namespace.

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: traefik
    chart: traefik-0.2.0
    heritage: Tiller
    release: traefik
  name: traefik
  namespace: traefik
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: traefik
      release: traefik
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: traefik
        chart: traefik-0.2.0
        heritage: Tiller
        release: traefik
    spec:
      containers:
      - args:
        - --api.insecure
        - --accesslog
        - --entrypoints.web.Address=:80
        - --entrypoints.websecure.Address=:443
        - --providers.kubernetescrd
        - --certificatesresolvers.default.acme.tlschallenge
        - [email protected]
        - --certificatesresolvers.default.acme.storage=acme.json
        - --certificatesresolvers.default.acme.caserver=https://acme-staging-v02.api.letsencrypt.org/directory
        image: traefik:2.2
        imagePullPolicy: IfNotPresent
        name: traefik
        ports:
        - containerPort: 80
          hostPort: 80
          name: web
          protocol: TCP
        - containerPort: 443
          hostPort: 443
          name: websecure
          protocol: TCP
        - containerPort: 8088
          name: admin
          protocol: TCP
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: traefik-service-account
      serviceAccountName: traefik-service-account
      terminationGracePeriodSeconds: 60
---
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
  name: wp-example.cryptexlabs.com
  namespace: wp-example
spec:
  entryPoints:
  - web
  routes:
  - kind: Rule
    match: Host(`wp-example.cryptexlabs.com`)
    services:
    - name: wp-example
      port: 80
    - name: wp-example
      port: 443
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/instance: wp-example
    app.kubernetes.io/managed-by: Tiller
    app.kubernetes.io/name: wordpress
    helm.sh/chart: wordpress-9.3.14
  name: wp-example-wordpress
  namespace: wp-example
spec:
  clusterIP: 10.101.142.74
  externalTrafficPolicy: Cluster
  ports:
  - name: http
    nodePort: 31862
    port: 80
    protocol: TCP
    targetPort: http
  - name: https
    nodePort: 32473
    port: 443
    protocol: TCP
    targetPort: https
  selector:
    app.kubernetes.io/instance: wp-example
    app.kubernetes.io/name: wordpress
  sessionAffinity: None
  type: LoadBalancer
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app.kubernetes.io/instance: wp-example
    app.kubernetes.io/managed-by: Tiller
    app.kubernetes.io/name: wordpress
    helm.sh/chart: wordpress-9.3.14
  name: wp-example-wordpress
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/instance: wp-example
      app.kubernetes.io/name: wordpress
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app.kubernetes.io/instance: wp-example
        app.kubernetes.io/managed-by: Tiller
        app.kubernetes.io/name: wordpress
        helm.sh/chart: wordpress-9.3.14
    spec:
      containers:
      - env:
        - name: ALLOW_EMPTY_PASSWORD
          value: "yes"
        - name: MARIADB_HOST
          value: wp-example-mariadb
        - name: MARIADB_PORT_NUMBER
          value: "3306"
        - name: WORDPRESS_DATABASE_NAME
          value: bitnami_wordpress
        - name: WORDPRESS_DATABASE_USER
          value: bn_wordpress
        - name: WORDPRESS_DATABASE_PASSWORD
          valueFrom:
            secretKeyRef:
              key: mariadb-password
              name: wp-example-mariadb
        - name: WORDPRESS_USERNAME
          value: user
        - name: WORDPRESS_PASSWORD
          valueFrom:
            secretKeyRef:
              key: wordpress-password
              name: wp-example-wordpress
        - name: WORDPRESS_EMAIL
          value: [email protected]
        - name: WORDPRESS_FIRST_NAME
          value: FirstName
        - name: WORDPRESS_LAST_NAME
          value: LastName
        - name: WORDPRESS_HTACCESS_OVERRIDE_NONE
          value: "no"
        - name: WORDPRESS_HTACCESS_PERSISTENCE_ENABLED
          value: "no"
        - name: WORDPRESS_BLOG_NAME
          value: "User's Blog!"
        - name: WORDPRESS_SKIP_INSTALL
          value: "no"
        - name: WORDPRESS_TABLE_PREFIX
          value: wp_
        - name: WORDPRESS_SCHEME
          value: http
        image: docker.io/bitnami/wordpress:5.4.2-debian-10-r6
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 6
          httpGet:
            path: /wp-login.php
            port: http
            scheme: HTTP
          initialDelaySeconds: 120
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
        name: wordpress
        ports:
        - containerPort: 8080
          name: http
          protocol: TCP
        - containerPort: 8443
          name: https
          protocol: TCP
        readinessProbe:
          failureThreshold: 6
          httpGet:
            path: /wp-login.php
            port: http
            scheme: HTTP
          initialDelaySeconds: 30
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
        resources:
          requests:
            cpu: 300m
            memory: 512Mi
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /bitnami/wordpress
          name: wordpress-data
          subPath: wordpress
      dnsPolicy: ClusterFirst
      hostAliases:
      - hostnames:
        - status.localhost
        ip: 127.0.0.1
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        fsGroup: 1001
        runAsUser: 1001
      terminationGracePeriodSeconds: 30
      volumes:
      - name: wordpress-data
        persistentVolumeClaim:
          claimName: wp-example-wordpress

Output of kubectl describe svc wp-example-wordpress -n wp-example

Name:                     wp-example-wordpress
Namespace:                wp-example
Labels:                   app.kubernetes.io/instance=wp-example
                          app.kubernetes.io/managed-by=Tiller
                          app.kubernetes.io/name=wordpress
                          helm.sh/chart=wordpress-9.3.14
Annotations:              <none>
Selector:                 app.kubernetes.io/instance=wp-example,app.kubernetes.io/name=wordpress
Type:                     LoadBalancer
IP:                       10.101.142.74
Port:                     http  80/TCP
TargetPort:               http/TCP
NodePort:                 http  31862/TCP
Endpoints:                10.32.0.17:8080
Port:                     https  443/TCP
TargetPort:               https/TCP
NodePort:                 https  32473/TCP
Endpoints:                10.32.0.17:8443
Session Affinity:         None
External Traffic Policy:  Cluster
Events:                   <none>

josh@Joshs-MacBook-Pro-2:$ ab -n 10000 -c 10 http://wp-example.cryptexlabs.com/
This is ApacheBench, Version 2.3 <$Revision: 1874286 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking wp-example.cryptexlabs.com (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Completed 10000 requests
Finished 10000 requests


Server Software:        Apache/2.4.43
Server Hostname:        wp-example.cryptexlabs.com
Server Port:            80

Document Path:          /
Document Length:        26225 bytes

Concurrency Level:      10
Time taken for tests:   37.791 seconds
Complete requests:      10000
Failed requests:        5000
   (Connect: 0, Receive: 0, Length: 5000, Exceptions: 0)
Non-2xx responses:      5000
Total transferred:      133295000 bytes
HTML transferred:       131230000 bytes
Requests per second:    264.61 [#/sec] (mean)
Time per request:       37.791 [ms] (mean)
Time per request:       3.779 [ms] (mean, across all concurrent requests)
Transfer rate:          3444.50 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        2    6   8.1      5     239
Processing:     4   32  29.2     39     315
Waiting:        4   29  26.0     34     307
Total:          7   38  31.6     43     458

Percentage of the requests served within a certain time (ms)
  50%     43
  66%     49
  75%     51
  80%     52
  90%     56
  95%     60
  98%     97
  99%    180
 100%    458 (longest request)

Traefik Debug Logs: https://pastebin.com/QUaAR6G0 are showing something about SSL and x509 certs though I'm making the request via http not https.

I did a test with an nginx container that uses the same pattern and I did not have any issues. So this has something to do specifically with the relationship between wordpress and traefik.

I also saw a reference on traefik regarding to the fact that Keep-Alive was not enabled on the downstream server and traefik has Keep-Alive enabled by default. I have also tried enabling Keep-Alive by extending the wordpress image and enabling Keep-Alive on wordpress. When I access the wordpress container through `kubectl port-forward I can see that the Keep-Alive headers are being sent so I know its enabled but I am still seeing 50% of the requests failing.

like image 986
Josh Woodcock Avatar asked Jun 21 '20 14:06

Josh Woodcock


1 Answers

I saw in the traefik logs that HTTP connections are fine but when HTTPS redirections happen for favicon etc. then you get x509 sertificate not valid. That's because wordpress pod has ssl certificate that's not valid.

You can use --serversTransport.insecureSkipVerify=true safely inside your cluster since traffic will be encrypted and outside traffic is HTTP.

If you need to use trusted certificate in future, deploy it with wordpress app and use traefik with ssl passthrough so traffic would be decrypted at pod level. Then you can remove insecure option on traefik.

like image 126
Akin Ozer Avatar answered Oct 17 '22 15:10

Akin Ozer