Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Consul not deregistering zombie services

I am deploying a simple hello world nginx container with marathon, and everything seems to work well, except that I have 6 containers that will not deregister from consul. docker ps shows none of the containers are running.

I tried using the /v1/catalog/deregister endpoint to deregister the services, but they keep coming back. I then killed the registrator container, and tried deregistering again. They came back.

I am running registrator with

docker run -d --name agent-registrator -v /var/run/docker.sock:/tmp/docker.sock --net=host gliderlabs/registrator consul://127.0.0.1:8500 -deregister-on-success -cleanup

There is 1 consul agent running.

Restarting the machine (this is a single node installation on a local vm) does not make the services go away.

How do I make these containers go away?

like image 216
Peter Klipfel Avatar asked Aug 27 '15 20:08

Peter Klipfel


2 Answers

Here is how you can absolutely delete all the zombie services: Go into your consul server, find the location of the json files containing the zombies and delete them.

For example I am running consul in a container:

docker run --restart=unless-stopped -d -h consul0 --name consul0 -v /mnt:/data \
    -p $(hostname -i):8300:8300 \
    -p $(hostname -i):8301:8301 \
    -p $(hostname -i):8301:8301/udp \
    -p $(hostname -i):8302:8302 \
    -p $(hostname -i):8302:8302/udp \
    -p $(hostname -i):8400:8400 \
    -p $(hostname -i):8500:8500 \
    -p $(ifconfig docker0 | awk '/\<inet\>/ { print $2}' | cut -d: -f2):53:53/udp \
    progrium/consul -server -advertise $(hostname -i) -bootstrap-expect 3

Notice the flag -v /mnt:/data this is where all the data consul is storing is located. For me it was located in /mnt. Under this directory you will find several other directories.

config raft serf services tmp

Go into services and you will see the files that contain the json info of your services, find any ones that contains the info of zombies and delete them. Then restart consul. Then repeat for each server in your cluster that has zombies on it.

like image 178
Alex Cohen Avatar answered Nov 04 '22 19:11

Alex Cohen


Don't use catalog, instead of using agent, the reason is catalog is maintained by agents, it will be resync-back by agent even if you remove it from catalog, remove zombie services shell script:

leader="$(curl http://ONE-OF-YOUR-CLUSTER:8500/v1/status/leader | sed 

's/:8300//' | sed 's/"//g')"
while :
do
serviceID="$(curl http://$leader:8500/v1/health/state/critical | ./jq '.[0].ServiceID' | sed 's/"//g')"
node="$(curl http://$leader:8500/v1/health/state/critical | ./jq '.[0].Node' | sed 's/"//g')"
echo "serviceID=$serviceID, node=$node"
size=${#serviceID}
echo "size=$size"
if [ $size -ge 7 ]; then
curl --request PUT http://$node:8500/v1/agent/service/deregister/$serviceID
else
break
fi
done
curl http://$leader:8500/v1/health/state/critical

json parser jq is used for field retrieving

like image 2
Ning Marshall Avatar answered Nov 04 '22 17:11

Ning Marshall