Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

kubectl: Error from server: No SSH tunnels currently open

I'm running some containers on Google Container Engine. One day everything was fine, and the next day I can't attach to my containers anymore. Or exec, or any other docker command.

I deleted the pods and let new ones be instanced, didn't help. Then I deleted the node and waited for a new one to be created and the pods deployed, didn't help either.

$ kubectl attach www-controller-dev-xxxxx

Error from server: No SSH tunnels currently open. Were the targets able to accept an ssh-key for user "gke-xxxxxxxxxxxxxxxxxxxxxxxx"?

What else can I try?

The problem might have started after I've deleted the cluster and recreated it, but I can't be sure. Did that before and it never was a problem.

like image 454
DiDiev Avatar asked Apr 02 '16 15:04

DiDiev


2 Answers

Commands like attach rely on the cluster's master being able to talk to the nodes in the cluster. However, because the master isn't in the same Compute Engine network as your cluster's nodes, we rely on SSH tunnels to enable secure communication.

Container Engine puts an SSH public key in your Compute Engine project metadata. All Compute Engine VMs using Google-provided images regularly check their project's common metadata and their instance's metadata for SSH keys to add to the VM's list of authorized users. Container Engine also adds a firewall rule to your Compute Engine network allowing SSH access from the master's IP address to each node in the cluster.

If kubectl attach (or logs, exec, and port-forward) doesn't work, it's likely that it's because the master is unable to open SSH tunnels to the nodes. To determine what the underlying problem is, you should check for these potential causes:

  1. The cluster doesn't have any nodes.

    If you've scaled down the number of nodes in your cluster to zero, SSH tunnels won't work.

    To fix it, resize your cluster to have at least one node.

  2. Pods in the cluster have gotten stuck in a terminating state and prevented nodes that no longer exist from being removed from the cluster.

    This is an issue that should only affect Kubernetes version 1.1, but could be caused by repeated resizing of the cluster down and up.

    To fix it, delete the pods that have been in a terminating state for more than a few minutes. The old nodes will then be removed from the master's API and replaced by the new nodes.

  3. Your network's firewall rules don't allow for SSH access to the master.

    All Compute Engine networks are created with a firewall rule called "default-allow-ssh" that allows SSH access from all IP addresses (requiring a valid private key, of course). Container Engine also inserts an SSH rule for each cluster of the form "gke---ssh" that allows SSH access specifically from the cluster's master IP to the cluster's nodes. If neither of these rules exists, then the master will be unable to open SSH tunnels.

    To fix it, re-add a firewall rule allowing access to VMs with the tag that's on all the cluster's nodes from the master's IP address.

  4. Your project's common metadata entry for sshKeys is full.

    If the project's metadata entry named "sshKeys" is close to the 32KiB size limit, then Container Engine isn't able to add its own SSH key to let it open SSH tunnels. You can see your project's metadata by running gcloud compute project-info describe [--project=PROJECT], then check the length of the list of sshKeys.

    To fix it, delete some of the SSH keys that are no longer needed.

  5. You have set a metadata field with the key "sshKeys" on the VMs in the cluster.

    The node agent on VMs prefers per-instance sshKeys to project-wide SSH keys, so if you've set any SSH keys specifically on the cluster's nodes, then the master's SSH key in the project metadata won't be respected by the nodes. To check, run gcloud compute instances describe <VM-name> and look for an "sshKeys" field in the metadata.

    To fix it, delete the per-instance SSH keys from the instance metadata.

It's worth noting that these features are not required for the correct functioning of the cluster. If you prefer to keep your cluster's network locked down from all outside access, that's perfectly fine. Just be aware that features like these won't work as a result.

like image 110
Alex Robinson Avatar answered Oct 22 '22 19:10

Alex Robinson


Another thing to consider: regional clusters. Regional clusters have multiple masters and the cluster endpoint is a load balancer. So, review your network routes!

If you have a VM instance acting as NAT gateway, you may have some routes forcing traffic trough it. So you must exclude from these routes the traffic between multiple Kubernetes masters and nodes.

You can find your master IPs by inspecting the firewall rule named gke-<cluster_name>-<short_uid>-ssh, then create routes to bypass the NAT gateway.

Here's gcloud commands to find GKE master IPs: bash FW_RULE_GKE_SSH=$(gcloud compute firewall-rules list --filter="name~'gke-.*-ssh'" --format="get(name)") GKE_MASTER_IP=$(gcloud compute firewall-rules describe ${FW_RULE_GKE_SSH} --format='value(sourceRanges)')

Special thanks for this issue and fix on this Terraform module: https://github.com/GoogleCloudPlatform/terraform-google-nat-gateway/issues/25

like image 25
Seboudry Avatar answered Oct 22 '22 21:10

Seboudry