I have installed rabbitmq using helm chart on a kubernetes cluster. The rabbitmq pod keeps restarting. On inspecting the pod logs I get the below error
2020-02-26 04:42:31.582 [warning] <0.314.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_durable_queue]}
2020-02-26 04:42:31.582 [info] <0.314.0> Waiting for Mnesia tables for 30000 ms, 6 retries left
When I try to do kubectl describe pod I get this error
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: data-rabbitmq-0
ReadOnly: false
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: rabbitmq-config
Optional: false
healthchecks:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: rabbitmq-healthchecks
Optional: false
rabbitmq-token-w74kb:
Type: Secret (a volume populated by a Secret)
SecretName: rabbitmq-token-w74kb
Optional: false
QoS Class: Burstable
Node-Selectors: beta.kubernetes.io/arch=amd64
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 3m27s (x878 over 7h21m) kubelet, gke-analytics-default-pool-918f5943-w0t0 Readiness probe failed: Timeout: 70 seconds ...
Checking health of node [email protected] ...
Status of node [email protected] ...
Error:
{:aborted, {:no_exists, [:rabbit_vhost, [{{:vhost, :"$1", :_, :_}, [], [:"$1"]}]]}}
Error:
{:aborted, {:no_exists, [:rabbit_vhost, [{{:vhost, :"$1", :_, :_}, [], [:"$1"]}]]}}
I have provisioned the above on Google Cloud on a kubernetes cluster. I am not sure during what specific situation it started failing. I had to restart the pod and since then it has been failing.
What is the issue here ?
... Will be able to clear mnesia state to restart; there's nothing really valuable there. Edit rabbitmq1-rabbitmq statefulset in insert 'rabbitmqctl force_boot' Find existing rabbitmq pods. Delete the pods from Step 2. Wait for rabbitmq pod (s) to be re-created and OpenStack Deployment State to say RUNNING.
I am not a RabbitMQ expert but in the documentation you can see this: Normally when you shut down a RabbitMQ cluster altogether, the first node you restart should be the last one to go down, since it may have seen things happen that other nodes did not.
We'd really prefer availability over integrity. Don't use persistence. Are you able to know the order in which the nodes were shut down? I am not a RabbitMQ expert but in the documentation you can see this:
When the cookie is misconfigured (for example, not identical), RabbitMQ nodes will log errors such as "Connection attempt from disallowed node", "", "Could not auto-cluster". For example, when a CLI tool connects and tries to authenticate using a mismatching secret value:
TLDR
helm upgrade rabbitmq --set clustering.forceBoot=true
Problem
The problem happens for the following reason:
So in the end, it's a bit of a mismatch between how RabbitMQ and StatefulSets work. RMQ says: "if everything goes down, just start everything and the same time, one will be able to start and as soon as this one is up, the others can rejoin the cluster." k8s StatefulSets say: "starting everything all at once is not possible, we'll start with the 0".
Solution
To fix this, there is a force_boot command for rabbitmqctl which basically tells an instance to start standalone if it doesn't find any peers. How you can use this from Kubernetes depends on the Helm chart and container you're using. In the Bitnami Chart, which uses the Bitnami Docker image, there is a value clustering.forceBoot = true
, which translates to an env variable RABBITMQ_FORCE_BOOT = yes
in the container, which will then issue the above command for you.
But looking at the problem, you can also see why deleting PVCs will work (other answer). The pods will just all "forget" that they were part of a RMQ cluster the last time around, and happily start. I would prefer the above solution though, as no data is being lost.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With