my coreos/fleet deployed service is dying and I can't tell why

Tags:

coreos

I'm trying to deploy nsqlookupd using fleet on a brand shiny new coreos cluster in EC2. Here is my systemd unit file:

[Unit]
Description=nsqlookupd service
After=docker.service
Requires=docker.service

[Service]
EnvironmentFile=/etc/environment
ExecStartPre=-/usr/bin/docker kill nsqlookupd
ExecStartPre=-/usr/bin/docker rm nsqlookupd
ExecStart=/usr/bin/docker run -d --name=nsqlookupd -e BROADCAST_ADDRESS=$COREOS_PUBLIC_IPV4 -p 4160:4160 -p 4161:4161 mikedewar/nsqlookupd
ExecStartPost=/usr/bin/etcdctl set /nsqlookupd_broadcast_address $COREOS_PUBLIC_IPV4
ExecStop=/usr/bin/docker stop -t 1 nsqlookupd
ExecStopPost=/usr/bin/etcdctl rm /nsqlookupd_broadcast_address

I've verified the container works fine if I just run the ExecStart command. My docker logs just look like

~ $ docker logs nsqlookupd
2014/08/08 02:23:58 nsqlookupd v0.2.29-alpha (built w/go1.2.2)
2014/08/08 02:23:58 TCP: listening on [::]:4160
2014/08/08 02:23:58 HTTP: listening on [::]:4161

and my fleetctl journal looks like

$ fleetctl journal nsqlookupd.service
-- Logs begin at Sun 2014-08-03 12:49:00 UTC, end at Fri 2014-08-08 02:30:06 UTC. --
Aug 08 02:23:57 ip-10-147-9-249 systemd[1]: Starting nsqlookupd service...
Aug 08 02:23:57 ip-10-147-9-249 docker[6140]: Error response from daemon: No such container: nsqlookupd
Aug 08 02:23:57 ip-10-147-9-249 docker[6140]: 2014/08/08 02:23:57 Error: failed to kill one or more containers
Aug 08 02:23:57 ip-10-147-9-249 docker[6148]: Error response from daemon: No such container: nsqlookupd
Aug 08 02:23:57 ip-10-147-9-249 docker[6148]: 2014/08/08 02:23:57 Error: failed to remove one or more containers
Aug 08 02:23:57 ip-10-147-9-249 etcdctl[6157]: 54.198.93.169
Aug 08 02:23:57 ip-10-147-9-249 systemd[1]: Started nsqlookupd service.
Aug 08 02:23:57 ip-10-147-9-249 docker[6155]: 0fce4465f61c092541ba9d4c4e89ce13c4d6bedc096519034ed585d7adb5e0d7
Aug 08 02:23:59 ip-10-147-9-249 docker[6194]: nsqlookupd

both of which look just fine. But the container dies quietly, and my fleetctl list-units gives

$ fleetctl list-units
UNIT                STATE       LOAD    ACTIVE          SUB     DESC                MACHINE
nsqlookupd.service  launched    loaded  deactivating    stop    nsqlookupd service  1320802c.../10.147.9.249

Running docker images is a little worrying:

$ docker images
REPOSITORY             TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
<none>                 <none>              8ef9d8f9d18d        9 minutes ago       710 MB
mikedewar/nsqadmin     latest              432af572bda8        2 days ago          710 MB
mikedewar/nsqd         latest              00bd4e474964        2 days ago          710 MB
<none>                 <none>              adf0ed97208e        3 weeks ago         710 MB
mikedewar/nsqlookupd   latest              2219c0e783d9        3 weeks ago         710 MB
<none>                 <none>              35d2212f8932        3 weeks ago         710 MB
mikedewar/nsq          latest              f9794fe056e1        3 weeks ago         710 MB
busybox                latest              a9eb17255234        9 weeks ago         2.433 MB
zmarcantel/cassandra   latest              b1168b45b4f8        4 months ago        738 MB

as I've been updating mikedewar/nsqlookupd quite regularly over the last 3 weeks. Maybe that's the time I first pushed something to docker hub? I'd love to know that the image I'm working with is the up-to-date one. I've tried docker rmi mikedewar/nsqlookupd followed by docker pull mikedewar/nsqlookupd but the CREATED column still says it was created 3 weeks ago.

I don't know if this is useful, but the ExecStopPost=/usr/bin/etcdctl rm /nsqlookupd_broadcast_address command seems to have worked - the etcdctl log line in the fleet journal suggests I managed to set the key to my IP, but after the container dies I can't get that key from etcd.

Any help on where to look next for clues, or any ideas why this is happening would be greatly appreciated! As is probably clear I'm rather new to this sort of thing...

658

asked Aug 08 '14 02:08

Mike Dewar

1 Answers

You shouldn't run docker containers in detached mode in a unit file. Your execstart contains it: ExecStart=/usr/bin/docker run -d. This will cause systemd to think the process exited immediately since it was forked into the background.

As for managing versions, if you want to be absolutely sure you're getting the latest copy, you should tag your containers and then pull mikedewar/nsqlookupd:1.2.3. You can increment this each time in your fleet unit file.

116

answered Oct 27 '22 20:10

Rob

Related questions
                            
                                docker-compose is starting containers after host reboot. Which ones?
                            
                                How to get the logs of a POD in openshift to local file
                            
                                List all files in Build Context and/or in WORKDIR when building container image
                            
                                How to use --device /dev/video0 with kubernetes?
                            
                                error while creating mount source path mkdir /host_mnt/d: file exists
                            
                                GCP: Unable to pull docker images from our GCP private container registry on ubuntu/debian VM instances
                            
                                Why isn't my docker-entrypoint-initdb.d script (as specified in docker-compose.yml) executed to initialize a fresh MySQL instance?
                            
                                Install a specific nodejs version with apt-get
                            
                                Use local docker image without registry setup in k8s
                            
                                In Java 8 it is shown as none of the available 4 collectors (GC) are selected by default
                            
                                Kubernetes: Nodes/Pods not showing with kubectl after building cluster with kubeadm
                            
                                cannot connect intelliJ with Docker Machine
                            
                                Docker loaded incorrect port for webpacker_dev_server
                            
                                Github Action flake8 fails: f-string is missing placeholders
                            
                                How to docker-compose up only for services?
                            
                                Jupyter starting a kernel in a docker container?
                            
                                Why I'm getting this error while building docker image?
                            
                                docker scan <REPOSITORY>:<TAG> - failed to get DockerScanID: bad status code "400 Bad Request"
                            
                                Docker: Error starting container: Unable to load the AUFS module
                            
                                Docker - increase size of the /dev/shm

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

my coreos/fleet deployed service is dying and I can't tell why

Tags:

docker

coreos

Mike Dewar

People also ask

1 Answers

Rob

Recent Activity

Donate For Us