I'm running several docker containers with restart=always
on Ubuntu 18.04.1 LTS. The physical server reboots every morning at 2am via cronjob executing reboot now
.
So far, I haven't had any problems with that in the past 5 or 6 months running that particular setup.
But today, containers didn't start after the daily reboot. The output of docker ps
was empty, all containers were in state "Exited".
Why does this happen all of a sudden? Was my setup mis-configured from the beginning, or does the recent docker-ce
package upgrade play a role?
Here are logs before and after the reboot as well as the docker.service
unit and version info:
root@skprov2:~# journalctl -b -1 -x -u docker
Nov 15 02:00:02 skprov2 systemd[1]: Stopping Docker Application Container Engine...
-- Subject: Unit docker.service has begun shutting down
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- Unit docker.service has begun shutting down.
Nov 15 02:00:02 skprov2 dockerd[1504]: time="2018-11-15T02:00:02.189764841+01:00" level=info msg="Processing signal 'terminated'"
Nov 15 02:00:02 skprov2 dockerd[1504]: time="2018-11-15T02:00:02.595098434+01:00" level=info msg="shim reaped" id=c929d444a6eb59a69a0da738ca782a9feb92ac1f80e5c4576bf85376c3d4c17a
Nov 15 02:00:02 skprov2 dockerd[1504]: time="2018-11-15T02:00:02.601217756+01:00" level=info msg="shim reaped" id=98a8c1b99cf986e6a889474f0fc28fe3635e466b21f8a37ef3c10a1050495c78
Nov 15 02:00:02 skprov2 dockerd[1504]: time="2018-11-15T02:00:02.604880385+01:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Nov 15 02:00:02 skprov2 dockerd[1504]: time="2018-11-15T02:00:02.670918937+01:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Nov 15 02:00:02 skprov2 dockerd[1504]: time="2018-11-15T02:00:02.732991633+01:00" level=info msg="shim reaped" id=9b3badc752786df08d00138c0222042a6bd80bb2c971f5a96b71e57105cea95c
Nov 15 02:00:02 skprov2 dockerd[1504]: time="2018-11-15T02:00:02.748732351+01:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Nov 15 02:00:02 skprov2 dockerd[1504]: time="2018-11-15T02:00:02.843982385+01:00" level=info msg="shim reaped" id=ae7531405113db8b4754491a12c2ababf09fa0c8f501bfe6f1b33e3ff18b6462
Nov 15 02:00:02 skprov2 dockerd[1504]: time="2018-11-15T02:00:02.869023019+01:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Nov 15 02:00:03 skprov2 dockerd[1504]: time="2018-11-15T02:00:03.863568729+01:00" level=info msg="shim reaped" id=b335536f5f07b1db3f32ba4452fc4aadacc02c6184cef7fc9df619ab81bbf002
Nov 15 02:00:04 skprov2 dockerd[1504]: time="2018-11-15T02:00:04.279347144+01:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Nov 15 02:00:12 skprov2 dockerd[1504]: time="2018-11-15T02:00:12.233635995+01:00" level=info msg="Container 77e00eabebea97357f05f564597d167acad8f2596b25d295b4366baf08ef3127 failed to exit within 10 seconds of signal 15 - using the force"
Nov 15 02:00:12 skprov2 dockerd[1504]: time="2018-11-15T02:00:12.253563540+01:00" level=info msg="Container 7f7d2a92bcdbb240a9400942c9301f5cd77bf9d3fbde1d38f41a2bd1226f9b09 failed to exit within 10 seconds of signal 15 - using the force"
Nov 15 02:00:12 skprov2 dockerd[1504]: time="2018-11-15T02:00:12.253563179+01:00" level=info msg="Container f6b49cc85eb7f9226ac192498b1e319d68e0de2faff6b4e3e67adabba43a093a failed to exit within 10 seconds of signal 15 - using the force"
Nov 15 02:00:12 skprov2 dockerd[1504]: time="2018-11-15T02:00:12.654403249+01:00" level=info msg="shim reaped" id=7f7d2a92bcdbb240a9400942c9301f5cd77bf9d3fbde1d38f41a2bd1226f9b09
Nov 15 02:00:12 skprov2 dockerd[1504]: time="2018-11-15T02:00:12.679675304+01:00" level=info msg="shim reaped" id=f6b49cc85eb7f9226ac192498b1e319d68e0de2faff6b4e3e67adabba43a093a
Nov 15 02:00:12 skprov2 dockerd[1504]: time="2018-11-15T02:00:12.680699340+01:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Nov 15 02:00:12 skprov2 dockerd[1504]: time="2018-11-15T02:00:12.689801078+01:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Nov 15 02:00:13 skprov2 dockerd[1504]: time="2018-11-15T02:00:13.088891655+01:00" level=info msg="shim reaped" id=77e00eabebea97357f05f564597d167acad8f2596b25d295b4366baf08ef3127
Nov 15 02:00:13 skprov2 dockerd[1504]: time="2018-11-15T02:00:13.111193244+01:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Nov 15 02:00:15 skprov2 dockerd[1504]: time="2018-11-15T02:00:15.233286510+01:00" level=info msg="stopping event stream following graceful shutdown" error="<nil>" module=libcontainerd namespace=moby
Nov 15 02:00:15 skprov2 dockerd[1504]: time="2018-11-15T02:00:15.233684167+01:00" level=info msg="stopping healthcheck following graceful shutdown" module=libcontainerd
Nov 15 02:00:15 skprov2 dockerd[1504]: time="2018-11-15T02:00:15.233695697+01:00" level=info msg="stopping event stream following graceful shutdown" error="context canceled" module=libcontainerd namespace=plugins.moby
Nov 15 02:00:15 skprov2 dockerd[1504]: time="2018-11-15T02:00:15.234287398+01:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc4209ec610, TRANSIENT_FAILURE" module=grpc
Nov 15 02:00:15 skprov2 dockerd[1504]: time="2018-11-15T02:00:15.234328545+01:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc4209ec610, CONNECTING" module=grpc
Nov 15 02:00:16 skprov2 systemd[1]: Stopped Docker Application Container Engine.
-- Subject: Unit docker.service has finished shutting down
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- Unit docker.service has finished shutting down.
==================================================================================
==================================================================================
==================================================================================
root@skprov2:~# journalctl -b 0 -x -u docker
-- Logs begin at Thu 2018-07-05 13:16:23 CEST, end at Thu 2018-11-15 08:16:31 CET. --
Nov 15 02:04:00 skprov2 systemd[1]: Starting Docker Application Container Engine...
-- Subject: Unit docker.service has begun start-up
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- Unit docker.service has begun starting up.
Nov 15 02:04:12 skprov2 dockerd[1690]: time="2018-11-15T02:04:12.152961544+01:00" level=info msg="systemd-resolved is running, so using resolvconf: /run/systemd/resolve/resolv.conf"
Nov 15 02:04:12 skprov2 dockerd[1690]: time="2018-11-15T02:04:12.432271212+01:00" level=info msg="parsed scheme: \"unix\"" module=grpc
Nov 15 02:04:12 skprov2 dockerd[1690]: time="2018-11-15T02:04:12.432315437+01:00" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
Nov 15 02:04:12 skprov2 dockerd[1690]: time="2018-11-15T02:04:12.439772198+01:00" level=info msg="parsed scheme: \"unix\"" module=grpc
Nov 15 02:04:12 skprov2 dockerd[1690]: time="2018-11-15T02:04:12.439800208+01:00" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
Nov 15 02:04:12 skprov2 dockerd[1690]: time="2018-11-15T02:04:12.471564855+01:00" level=info msg="ccResolverWrapper: sending new addresses to cc: [{unix:///run/containerd/containerd.sock 0 <nil>}]" module=grpc
Nov 15 02:04:12 skprov2 dockerd[1690]: time="2018-11-15T02:04:12.471618580+01:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
Nov 15 02:04:12 skprov2 dockerd[1690]: time="2018-11-15T02:04:12.471653422+01:00" level=info msg="ccResolverWrapper: sending new addresses to cc: [{unix:///run/containerd/containerd.sock 0 <nil>}]" module=grpc
Nov 15 02:04:12 skprov2 dockerd[1690]: time="2018-11-15T02:04:12.475146270+01:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
Nov 15 02:04:12 skprov2 dockerd[1690]: time="2018-11-15T02:04:12.475678777+01:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc420abc010, CONNECTING" module=grpc
Nov 15 02:04:12 skprov2 dockerd[1690]: time="2018-11-15T02:04:12.475795536+01:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc42080b0b0, CONNECTING" module=grpc
Nov 15 02:04:12 skprov2 dockerd[1690]: time="2018-11-15T02:04:12.475769194+01:00" level=info msg="blockingPicker: the picked transport is not ready, loop back to repick" module=grpc
Nov 15 02:04:12 skprov2 dockerd[1690]: time="2018-11-15T02:04:12.476273893+01:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc42080b0b0, READY" module=grpc
Nov 15 02:04:12 skprov2 dockerd[1690]: time="2018-11-15T02:04:12.476346309+01:00" level=info msg="pickfirstBalancer: HandleSubConnStateChange: 0xc420abc010, READY" module=grpc
Nov 15 02:04:14 skprov2 dockerd[1690]: time="2018-11-15T02:04:14.769703354+01:00" level=info msg="[graphdriver] using prior storage driver: overlay2"
Nov 15 02:04:23 skprov2 dockerd[1690]: time="2018-11-15T02:04:23.247573731+01:00" level=info msg="Graph migration to content-addressability took 0.00 seconds"
Nov 15 02:04:23 skprov2 dockerd[1690]: time="2018-11-15T02:04:23.247926863+01:00" level=warning msg="Your kernel does not support swap memory limit"
Nov 15 02:04:23 skprov2 dockerd[1690]: time="2018-11-15T02:04:23.247998928+01:00" level=warning msg="Your kernel does not support cgroup rt period"
Nov 15 02:04:23 skprov2 dockerd[1690]: time="2018-11-15T02:04:23.248016977+01:00" level=warning msg="Your kernel does not support cgroup rt runtime"
Nov 15 02:04:23 skprov2 dockerd[1690]: time="2018-11-15T02:04:23.254944197+01:00" level=info msg="Loading containers: start."
Nov 15 02:04:25 skprov2 dockerd[1690]: time="2018-11-15T02:04:25.856323528+01:00" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address"
Nov 15 02:04:35 skprov2 dockerd[1690]: time="2018-11-15T02:04:35.182112549+01:00" level=error msg="Failed to start container c929d444a6eb59a69a0da738ca782a9feb92ac1f80e5c4576bf85376c3d4c17a: id already in use"
Nov 15 02:04:35 skprov2 dockerd[1690]: time="2018-11-15T02:04:35.206030890+01:00" level=error msg="Failed to start container b335536f5f07b1db3f32ba4452fc4aadacc02c6184cef7fc9df619ab81bbf002: id already in use"
Nov 15 02:04:35 skprov2 dockerd[1690]: time="2018-11-15T02:04:35.235647072+01:00" level=error msg="Failed to start container ae7531405113db8b4754491a12c2ababf09fa0c8f501bfe6f1b33e3ff18b6462: id already in use"
Nov 15 02:04:35 skprov2 dockerd[1690]: time="2018-11-15T02:04:35.374241415+01:00" level=error msg="Failed to start container 9b3badc752786df08d00138c0222042a6bd80bb2c971f5a96b71e57105cea95c: id already in use"
Nov 15 02:04:35 skprov2 dockerd[1690]: time="2018-11-15T02:04:35.410173049+01:00" level=error msg="Failed to start container 7f7d2a92bcdbb240a9400942c9301f5cd77bf9d3fbde1d38f41a2bd1226f9b09: id already in use"
Nov 15 02:04:36 skprov2 dockerd[1690]: time="2018-11-15T02:04:36.171600568+01:00" level=error msg="Failed to start container 98a8c1b99cf986e6a889474f0fc28fe3635e466b21f8a37ef3c10a1050495c78: id already in use"
Nov 15 02:04:36 skprov2 dockerd[1690]: time="2018-11-15T02:04:36.970077586+01:00" level=error msg="Failed to start container f6b49cc85eb7f9226ac192498b1e319d68e0de2faff6b4e3e67adabba43a093a: id already in use"
Nov 15 02:04:36 skprov2 dockerd[1690]: time="2018-11-15T02:04:36.993993749+01:00" level=error msg="Failed to start container 77e00eabebea97357f05f564597d167acad8f2596b25d295b4366baf08ef3127: id already in use"
Nov 15 02:04:36 skprov2 dockerd[1690]: time="2018-11-15T02:04:36.994202774+01:00" level=info msg="Loading containers: done."
Nov 15 02:04:37 skprov2 dockerd[1690]: time="2018-11-15T02:04:37.492457742+01:00" level=info msg="Docker daemon" commit=4d60db4 graphdriver(s)=overlay2 version=18.09.0
Nov 15 02:04:37 skprov2 dockerd[1690]: time="2018-11-15T02:04:37.494916840+01:00" level=info msg="Daemon has completed initialization"
Nov 15 02:04:37 skprov2 dockerd[1690]: time="2018-11-15T02:04:37.669139526+01:00" level=info msg="API listen on /var/run/docker.sock"
Nov 15 02:04:37 skprov2 systemd[1]: Started Docker Application Container Engine.
-- Subject: Unit docker.service has finished start-up
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- Unit docker.service has finished starting up.
--
-- The start-up result is RESULT.
==================================================================================
==================================================================================
==================================================================================
root@skprov2:~# cat /lib/systemd/system/docker.service
[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
BindsTo=containerd.service
After=network-online.target firewalld.service
Wants=network-online.target
[Service]
Type=notify
# the default is not to use systemd for cgroups because the delegate issues still
# exists and systemd currently does not support the cgroup feature set required
# for containers run by docker
ExecStart=/usr/bin/dockerd -H unix://
ExecReload=/bin/kill -s HUP $MAINPID
TimeoutSec=0
RestartSec=2
Restart=always
# Note that StartLimit* options were moved from "Service" to "Unit" in systemd 229.
# Both the old, and new location are accepted by systemd 229 and up, so using the old location
# to make them work for either version of systemd.
StartLimitBurst=3
# Note that StartLimitInterval was renamed to StartLimitIntervalSec in systemd 230.
# Both the old, and new name are accepted by systemd 230 and up, so using the old name to make
# this option work for either version of systemd.
StartLimitInterval=60s
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity
# Comment TasksMax if your systemd version does not supports it.
# Only systemd 226 and above support this option.
TasksMax=infinity
# set delegate yes so that systemd does not reset the cgroups of docker containers
Delegate=yes
# kill only the docker process, not all processes in the cgroup
KillMode=process
[Install]
WantedBy=multi-user.target
==================================================================================
==================================================================================
==================================================================================
root@skprov2:~# docker info && docker version
Containers: 14
Running: 5
Paused: 0
Stopped: 9
Images: 61
Server Version: 18.09.0
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: c4446665cb9c30056f4998ed953e6d4ff22c7c39
runc version: 4fc53a81fb7c994640722ac585fa9ca548971871
init version: fec3683
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.15.0-39-generic
Operating System: Ubuntu 18.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 16
Total Memory: 31.39GiB
Name: skprov2
ID: EDC2:AGFH:BHKP:P4HS:M5DA:ZPXM:AU6B:TV6E:6KIU:YC4S:F3NN:35A4
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
localhost:5000
127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine
WARNING: No swap limit support
Client:
Version: 18.09.0
API version: 1.39
Go version: go1.10.4
Git commit: 4d60db4
Built: Wed Nov 7 00:49:01 2018
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 18.09.0
API version: 1.39 (minimum version 1.12)
Go version: go1.10.4
Git commit: 4d60db4
Built: Wed Nov 7 00:16:44 2018
OS/Arch: linux/amd64
Experimental: false
I stumbled on the same issue. In my case the issue was due to Docker not cleaning up after himself properly. As seen in Docker's log:
time="2018-12-31T17:38:54.330555181+02:00" level=error msg="2089c8095e62011b0dc05e66c51ae59d648d909ca7a8e806af0fdf39b2e3006c cleanup: failed to delete container from containerd: transport is closing: unknown"
These id
s are used on startup because of restart=always
.
So Docker says
time="2018-12-31T17:40:04.648261275+02:00" level=error msg="Failed to start container 2089c8095e62011b0dc05e66c51ae59d648d909ca7a8e806af0fdf39b2e3006c: id already in use"
Seems like the Docker daemon shuts down faster than the containers are cleaned (probably because of the container's ignoring signals or something)
So seems the solution for me was to change the daemon's shutdown-timeout
in the Docker daemon configuration file. The default is 10s or something, I changed it to 60s and no I don't experience these issues anymore.
I still think this is a legitimate scenario that should work out-of-the-box though.
The containerd "id already in use" errors appear to have been fixed in the latest release of Docker CE. Try upgrading to 18.09.3 to see if that corrects your issue.
https://github.com/docker/docker-ce/releases/tag/v18.09.3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With