Rabbit mq - Error while waiting for Mnesia tables

Tags:

I have installed rabbitmq using helm chart on a kubernetes cluster. The rabbitmq pod keeps restarting. On inspecting the pod logs I get the below error

2020-02-26 04:42:31.582 [warning] <0.314.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_durable_queue]}
2020-02-26 04:42:31.582 [info] <0.314.0> Waiting for Mnesia tables for 30000 ms, 6 retries left

When I try to do kubectl describe pod I get this error

Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-rabbitmq-0
    ReadOnly:   false
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      rabbitmq-config
    Optional:  false
  healthchecks:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      rabbitmq-healthchecks
    Optional:  false
  rabbitmq-token-w74kb:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  rabbitmq-token-w74kb
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  beta.kubernetes.io/arch=amd64
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                      From                                               Message
  ----     ------     ----                     ----                                               -------
  Warning  Unhealthy  3m27s (x878 over 7h21m)  kubelet, gke-analytics-default-pool-918f5943-w0t0  Readiness probe failed: Timeout: 70 seconds ...
Checking health of node [email protected] ...
Status of node [email protected] ...
Error:
{:aborted, {:no_exists, [:rabbit_vhost, [{{:vhost, :"$1", :_, :_}, [], [:"$1"]}]]}}
Error:
{:aborted, {:no_exists, [:rabbit_vhost, [{{:vhost, :"$1", :_, :_}, [], [:"$1"]}]]}}

I have provisioned the above on Google Cloud on a kubernetes cluster. I am not sure during what specific situation it started failing. I had to restart the pod and since then it has been failing.

What is the issue here ?

451

asked Feb 26 '20 05:02

jeril

1 Answers

TLDR

helm upgrade rabbitmq --set clustering.forceBoot=true

Problem

The problem happens for the following reason:

All RMQ pods are terminated at the same time due to some reason (maybe because you explicitly set the StatefulSet replicas to 0, or something else)
One of them is the last one to stop (maybe just a tiny bit after the others). It stores this condition ("I'm standalone now") in its filesystem, which in k8s is the PersistentVolume(Claim). Let's say this pod is rabbitmq-1.
When you spin the StatefulSet back up, the pod rabbitmq-0 is always the first to start (see here).
During startup, pod rabbitmq-0 first checks whether it's supposed to run standalone. But as far as it can see on its own filesystem, it's part of a cluster. So it checks for its peers and doesn't find any. This results in a startup failure by default.
rabbitmq-0 thus never becomes ready.
rabbitmq-1 is never starting because that's how StatefulSets are deployed - one after another. If it were to start, it would start successfully because it sees that it can run standalone as well.

So in the end, it's a bit of a mismatch between how RabbitMQ and StatefulSets work. RMQ says: "if everything goes down, just start everything and the same time, one will be able to start and as soon as this one is up, the others can rejoin the cluster." k8s StatefulSets say: "starting everything all at once is not possible, we'll start with the 0".

Solution

To fix this, there is a force_boot command for rabbitmqctl which basically tells an instance to start standalone if it doesn't find any peers. How you can use this from Kubernetes depends on the Helm chart and container you're using. In the Bitnami Chart, which uses the Bitnami Docker image, there is a value clustering.forceBoot = true, which translates to an env variable RABBITMQ_FORCE_BOOT = yes in the container, which will then issue the above command for you.

But looking at the problem, you can also see why deleting PVCs will work (other answer). The pods will just all "forget" that they were part of a RMQ cluster the last time around, and happily start. I would prefer the above solution though, as no data is being lost.

164

answered Oct 25 '22 20:10

Ulli

Related questions
                            
                                How do I make an HTTPS call in a Busybox Docker container running Go?
                            
                                What is hcp-tunnelfront?
                            
                                Use of Skaffold using Minikube without registry
                            
                                Running a command on all kubernetes pods of a service
                            
                                How to change permission of mapped volume in kubernetes/Docker
                            
                                http -> https redirect in Google Kubernetes Engine
                            
                                Minikube error - " unknown field "app" in io.k8s"
                            
                                Is there a way to get ordinal index of a pod with in kubernetes statefulset configuration file?
                            
                                What is the purpose of kubectl proxy?
                            
                                In Kubernetes, how can I scale a Deployment to zero when idle
                            
                                docker repository name component must match
                            
                                Minikube does not start, kubectl connection to server was refused
                            
                                Monitor custom kubernetes pod metrics using Prometheus
                            
                                EKS - Node labels [closed]
                            
                                How can I use Github packages Docker registry in Kubernetes dockerconfigjson?
                            
                                Allocate or Limit resource for pods in Kubernetes?
                            
                                Is it possible to rewrite HOST header in k8s Ingress Controller?
                            
                                how to install kubernetes manually?
                            
                                can i use a configmap created from an init container in the pod
                            
                                kubectl : Unable to connect to the server : dial tcp 192.168.214.136:6443: connect: no route to host

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Rabbit mq - Error while waiting for Mnesia tables

Tags:

rabbitmq

kubernetes

google-kubernetes-engine

kubernetes-helm

rabbitmq-exchange

jeril

People also ask

1 Answers

Ulli

Recent Activity

Donate For Us