I have an existing GKE cluster that was created from some config in Terraform that I got from a tutorial in GitHub.
The cluster has a default node pool with 3 nodes.
I tried to add another node pool with 3 nodes through the GKE console,
but when I do kubectl get nodes
I only see 4 nodes, not 6.
When I tried the same through gcloud
command line,
I remember seeing a message due to IP space.
Seems like I cannot have 6 nodes because of IP space.
How can I change the IP space of my existing cluster?
I did some research on this and it seems like it cannot be changed for an existing cluster in GKE?
How and where can I set this IP space for a new cluster then?
UPDATE:
I found the error message in my notifications in GCP:
(1) deploy error: Not all instances running in IGM after 19.314823406s. Expect 1. Current errors: [IP_SPACE_EXHAUSTED]: Instance '--6fa3ebb6-cw6t' creation failed: IP space of 'projects//regions/us-east4/subnetworks/-pods-4851bf1518184e60' is exhausted. (2) deploy error: Not all instances running in IGM after 19.783096708s. Expect 1. Current errors: [IP_SPACE_EXHAUSTED]: Instance '-spec--bf111c8e-h8mm' creation failed: IP space of 'projects//regions/us-east4/subnetworks/-pods-4851bf1518184e60' is exhausted.
You can change the cluster ip service range by modifying the /etc/kubernetes/manifest/kube-controller-manager. yaml file. However, you should cordon off the affected nodes and stop all delete all services/deployments so they can get recreated. You may need to change the setting of your cni plugin.
Service (cluster IP) addresses are taken from the cluster's subnet's secondary IP address range for Services. You must ensure this range is large enough to provide addresses for all the Kubernetes Services you host in your cluster. For a cluster that runs up to 3000 Services, you need 3000 cluster IP addresses.
To find the cluster IP address of a Kubernetes pod, use the kubectl get pod command on your local machine, with the option -o wide . This option will list more information, including the node the pod resides on, and the pod's cluster IP. The IP column will contain the internal cluster IP address for each pod.
I have figured out the issue.
Background on the issue can be read in detail here.
Specifically, the part:
"...if you set the maximum Pods per node to 30 then, per the table above, a /26 CIDR range is used, and each Node is assigned 64 IP addresses. If you do not configure the maximum number of Pods per node, a /24 CIDR range is used, and each node is assigned 256 IP addresses..."
I had deployed this cluster through a Terraform demo, so I am not sure how to do the changes through GCP Console or command line.
I have made changes to the config in Terraform which resolved this issue.
The Terraform configuration for a variable called kubernetes_pods_ipv4_cidr
was 10.1.92.0/22.
This meant that a range of 10.1.92.0 – 10.1.95.255 was assigned to the Cluster Nodes for Pods.
According to the GCP documentation, by default, a node will have maximum of 110 Pods and be assigned 256 IP addresses.
Hence with the default maximum Pod per Node count, there can only be 4 Nodes on my cluster, since each node will be assigned 256 IP addresses for Pods.
I added a new field, default_max_pods_per_node
, in my Terraform config to reduce this maximum from the default of 110 to 55:
resource "google_container_cluster" "my-cluster" {
provider = "google-beta"
name = "my-cluster"
project = "${local.my-cluster_project_id}"
location = "${var.region}"
default_max_pods_per_node = 55
After that, my cluster was able to support more nodes.
Alternatively, you can also change the IP range assigned to kubernetes_pods_ipv4_cidr
.
The error message you are seeing isn't that your GKE cluster is out of IP space (which can happen if you create a cluster with a small CIDR range for pod IPs) but rather that the underlying GCP network in which the cluster exists is out of space. If you look at the subnetwork (it looks like it's called -pods-4851bf1518184e60
based on your error message) where the cluster is running you should see that it doesn't have sufficient space to add additional nodes.
You can confirm that this is the problem by deleting the new node pool and trying to scale the original node pool from 3 to 6 nodes.
I don't recall if there is a way to expand the size of a subnet dynamically. If so, then you can add IP space to the subnet to add nodes. If not, you will need to create a new cluster in a larger subnet.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With