Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

my coreos/fleet deployed service is dying and I can't tell why

Tags:

docker

coreos

I'm trying to deploy nsqlookupd using fleet on a brand shiny new coreos cluster in EC2. Here is my systemd unit file:

[Unit]
Description=nsqlookupd service
After=docker.service
Requires=docker.service

[Service]
EnvironmentFile=/etc/environment
ExecStartPre=-/usr/bin/docker kill nsqlookupd
ExecStartPre=-/usr/bin/docker rm nsqlookupd
ExecStart=/usr/bin/docker run -d --name=nsqlookupd -e BROADCAST_ADDRESS=$COREOS_PUBLIC_IPV4 -p 4160:4160 -p 4161:4161 mikedewar/nsqlookupd
ExecStartPost=/usr/bin/etcdctl set /nsqlookupd_broadcast_address $COREOS_PUBLIC_IPV4
ExecStop=/usr/bin/docker stop -t 1 nsqlookupd
ExecStopPost=/usr/bin/etcdctl rm /nsqlookupd_broadcast_address

I've verified the container works fine if I just run the ExecStart command. My docker logs just look like

~ $ docker logs nsqlookupd
2014/08/08 02:23:58 nsqlookupd v0.2.29-alpha (built w/go1.2.2)
2014/08/08 02:23:58 TCP: listening on [::]:4160
2014/08/08 02:23:58 HTTP: listening on [::]:4161

and my fleetctl journal looks like

$ fleetctl journal nsqlookupd.service
-- Logs begin at Sun 2014-08-03 12:49:00 UTC, end at Fri 2014-08-08 02:30:06 UTC. --
Aug 08 02:23:57 ip-10-147-9-249 systemd[1]: Starting nsqlookupd service...
Aug 08 02:23:57 ip-10-147-9-249 docker[6140]: Error response from daemon: No such container: nsqlookupd
Aug 08 02:23:57 ip-10-147-9-249 docker[6140]: 2014/08/08 02:23:57 Error: failed to kill one or more containers
Aug 08 02:23:57 ip-10-147-9-249 docker[6148]: Error response from daemon: No such container: nsqlookupd
Aug 08 02:23:57 ip-10-147-9-249 docker[6148]: 2014/08/08 02:23:57 Error: failed to remove one or more containers
Aug 08 02:23:57 ip-10-147-9-249 etcdctl[6157]: 54.198.93.169
Aug 08 02:23:57 ip-10-147-9-249 systemd[1]: Started nsqlookupd service.
Aug 08 02:23:57 ip-10-147-9-249 docker[6155]: 0fce4465f61c092541ba9d4c4e89ce13c4d6bedc096519034ed585d7adb5e0d7
Aug 08 02:23:59 ip-10-147-9-249 docker[6194]: nsqlookupd

both of which look just fine. But the container dies quietly, and my fleetctl list-units gives

$ fleetctl list-units
UNIT                STATE       LOAD    ACTIVE          SUB     DESC                MACHINE
nsqlookupd.service  launched    loaded  deactivating    stop    nsqlookupd service  1320802c.../10.147.9.249

Running docker images is a little worrying:

$ docker images
REPOSITORY             TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
<none>                 <none>              8ef9d8f9d18d        9 minutes ago       710 MB
mikedewar/nsqadmin     latest              432af572bda8        2 days ago          710 MB
mikedewar/nsqd         latest              00bd4e474964        2 days ago          710 MB
<none>                 <none>              adf0ed97208e        3 weeks ago         710 MB
mikedewar/nsqlookupd   latest              2219c0e783d9        3 weeks ago         710 MB
<none>                 <none>              35d2212f8932        3 weeks ago         710 MB
mikedewar/nsq          latest              f9794fe056e1        3 weeks ago         710 MB
busybox                latest              a9eb17255234        9 weeks ago         2.433 MB
zmarcantel/cassandra   latest              b1168b45b4f8        4 months ago        738 MB

as I've been updating mikedewar/nsqlookupd quite regularly over the last 3 weeks. Maybe that's the time I first pushed something to docker hub? I'd love to know that the image I'm working with is the up-to-date one. I've tried docker rmi mikedewar/nsqlookupd followed by docker pull mikedewar/nsqlookupd but the CREATED column still says it was created 3 weeks ago.

I don't know if this is useful, but the ExecStopPost=/usr/bin/etcdctl rm /nsqlookupd_broadcast_address command seems to have worked - the etcdctl log line in the fleet journal suggests I managed to set the key to my IP, but after the container dies I can't get that key from etcd.

Any help on where to look next for clues, or any ideas why this is happening would be greatly appreciated! As is probably clear I'm rather new to this sort of thing...

like image 658
Mike Dewar Avatar asked Aug 08 '14 02:08

Mike Dewar


People also ask

Why is CoreOS no longer supporting fleet?

Reason: On December 22, 2016, CoreOS announced that it will no longer maintain fleet; It will receive security updates and bug fixes until February of 2017, when it will be removed from CoreOS. The project recommends using Kubernetes for all clustering needs.

Can I use Kubernetes on CoreOS without fleet?

See Instead: For guidance using Kubernetes on CoreOS without fleet, see the Kubernetes on CoreOS Documentation. CoreOS provides an excellent environment for managing Docker containers across multi-server environments. One of the most essential components for making this cluster management simple is a service called fleet.

Why is my CoreOS cluster failing to come up correctly?

One of the most common issues that new and experienced CoreOS users run into when a cluster is failing to come up correctly is an invalid cloud-config file. CoreOS requires that a cloud-config file be passed into your server upon creation.

How can I see the systemd configuration files created by CoreOS?

When your CoreOS machine processes the cloud-config file, it generates stub systemd unit files that it uses to start up fleet and etcd. To see the systemd configuration files that were created and are being used to start your services, change to the directory where they were dropped:


1 Answers

You shouldn't run docker containers in detached mode in a unit file. Your execstart contains it: ExecStart=/usr/bin/docker run -d. This will cause systemd to think the process exited immediately since it was forked into the background.

As for managing versions, if you want to be absolutely sure you're getting the latest copy, you should tag your containers and then pull mikedewar/nsqlookupd:1.2.3. You can increment this each time in your fleet unit file.

like image 116
Rob Avatar answered Oct 27 '22 20:10

Rob