We would like to spin up pods quickly on our cluster, to handle 'one-off' tasks (the idea being that each task has a new pod every time it runs).
Currently, it takes about 10-15 seconds from a Pod creation API call -> completion. This is running on 3x m3 xlarge on AWS, with images that have already been cached (I presume, as I am using the same image twice on a single Node). We are running with restartPolicy = Never, as they are one off tasks.
I've tried fiddling with the imagePullPolicy (= Never) and resource options with no avail. It appears that the 10 second delay happens in the 'Running' phase, after Kubernetes has handed it off to a Pod. I can confirm the operation itself is very quick: running locally on Docker only takes about 0.5s total, including the operation.
Is there any way to speed this up?
Our target is 5s latency from creation -> Running (assuming image is pre-pulled). The issue tracking this was https://github.com/GoogleCloudPlatform/kubernetes/issues/3954.
This issue was closed a couple weeks ago, so please update to version 20.2 and give it another try.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With