Is there any way to scale Kubernetes nodes based on network utilization and not based on memory or CPU?
Let's say for example you are sending thousands of requests to a couple of nodes behind a load balancer. The CPU is not struggling or the memory, but because there are thousands of requests per second you would need additional nodes to serve this. How can you do this in Google Cloud Kubernetes?
I have been researching around but I can't seem to find any references to this type of scaling, and I am guessing I am not the only one to come across this problem. So I am wondering if any of you knows of any best practice solutions.
I guess the ideal solution would be to have one pod per node receiving requests and creating more nodes based on more requests and scale up or down based on this.
In Kubernetes, a HorizontalPodAutoscaler automatically updates a workload resource (such as a Deployment or StatefulSet), with the aim of automatically scaling the workload to match demand. Horizontal scaling means that the response to increased load is to deploy more Pods.
By default, the Horizontal Pod Autoscaler checks Pods metrics every 15 seconds. In the worst-case scenario, the Horizontal Pod Autoscaler can take up to 1 minute and a half to trigger the autoscaling (i.e. 10s + 60s + 15s). The Cluster Autoscaler checks for unschedulable Pods in the cluster every 10 seconds.
The kubectl scale method is the fastest way to scale. However, you may prefer another method in some situations, like when updating configuration files or when performing in-place modifications. The kubectl scale command lets your instantaneously change the number of replicas you want to run your application.
This is possible and you have to use Prometheus Adaptor to configure custom rules to generate Custom Metrics.
This link has more details on how to setup prometheus, install adaptor and apply configuration with custom metrics..
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With