We upgraded our existing development cluster from 1.13.6-gke.13 to 1.14.6-gke.13 and our pods can no longer reach our in-house network over our Google Cloud VPN. Our production cluster (still on 1.13) shares the same VPC network and VPN tunnels and is still working fine. The only thing that changed was the upgrade of the admin node and node pool to 1.14 on the development cluster.
I have opened a shell into a pod on the development cluster and attempted to ping the IP address of an in-house server to which we need access. No response received. Doing the same on a pod in our production cluster works as expected.
I ssh'd into a node in the cluster and was able to ping the in-house network. so it's just pods that have networking issues.
Access to the publicly exposed services in the cluster is still working as expected. Health checks are OK.
UPDATE:
I created a new node pool using the latest 1.13 version, drained the pods from the 1.14 pool and all is well with the pods running on the 1.13 pool again. Something is definitely up with 1.14. It remains to be seen if this is an issue cause by some new configuration option or just a bug.
RESOLUTION:
IP masquerading is discussed here https://cloud.google.com/kubernetes-engine/docs/how-to/ip-masquerade-agent. My resolution was to add the pod subnets for each of my clusters to the list of advertised networks in my VPN Cloud Routers on GCP. So now the pod networks can traverse the VPN.
Until GKE 1.13.x, even if not necessary, GKE will masquerade pods trying to reach external IPs, even on the same VPC of the cluster, unless the destination is on the 10.0.0.0/8 range.
Since 1.14.x versions, this rule is no longer added by default on clusters. This means that pods trying to reach any endpoint will be seen with their Pod IP instead of the node IP as the masquerade rule was removed.
You could try recreating your Cloud VPN in order to include the POD IP range.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With