I would like to be able to see all of the various things that happened to a kube cluster on a timeline, including when nodes were found to be dead, when new nodes were added, when pods crashed and when they were restarted.
So far the best that we have found is kubectl get event
but that seems to have a few limitations:
One idea that I have is to write a pod that will use the API to watch the stream of events and log them to a file. This would let us control retention and it seems that events that occur while we are watching will not be combined, solving the second problem as well.
What are other people doing about this?
When compared to many other Kubernetes objects, events have a lot of activity. Events have a one-hour life period by default, and a distinct etcd cluster is advised for scalability.
A Kubernetes event is an object that shows what's happening inside a cluster, node, pod, or container. These objects are usually generated in response to changes that occur inside your K8s system. The Kubernetes API Server enables all core components to create these events.
In a default Kubernetes setup, the events are persisted into etcd, a key-value store. etcd is optimized for quick strongly consistent lookups, but falls short on its ability to provide analytical abilities over the data.
To collect or watch the events, you can run kubectl get events --watch in deployment and collect the output with a third-party logging tool. To watch Kubernetes events, many free and paid third-party tools help provide visibility and reporting of events in a Kubernetes cluster resource.
My understanding is that Kubernetes itself dedups events, documented here: https://github.com/kubernetes/kubernetes/blob/master/docs/design/event_compression.md Once that happens, there is no way to get the individual events back.
See https://github.com/kubernetes/kubernetes/issues/36304 for complaints how that loses info. https://github.com/kubernetes/kubernetes/pull/46034 at least improved the message. See also https://github.com/kubernetes/enhancements/pull/1291 KEP for recent discussion and proposal to improve usability in kubectl.
How long events are retained? Their "time-to-live" is apparently controlled by kube-apiserver --event-ttl
option, defaults to 1 hour:
https://github.com/kubernetes/kubernetes/blob/da53a247633/cmd/kube-apiserver/app/options/options.go#L71-L72
You can raise this. Might require more resources for etcd
— from what I saw in some 2015 github discussions, event TTL used to be 2 days, and events were the main thing stressing etcd
...
In a pinch, it might be possible to figure out what happened earlier from various log, especially the kubelet logs?
Running kubectl get event -o yaml --watch
into a persistent file sounds like a simple thing to do. I think when you watch events as they arrive, you see them pre-dedup.
Heapster can send events to some of the supported sinks: https://github.com/kubernetes/heapster/blob/master/docs/sink-configuration.md
Eventrouter can send events to various sinks: https://github.com/heptiolabs/eventrouter/tree/master/sinks
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With