I have a node.js app on two VM instances that I'm trying to load balance with network load balancing. To test that my servers are up and serving, I have the health check request '/health.txt' on my app internal listening port. I have two instances configured identically with the same tags, firewall rules, etc, but the health check fails to one instance continuously, I can do the check using curl on my internal network or from outside and the test works fine on both instances, but the network load balancer always reports one instance as down.
I used ngrep and running from the health instance, I see:
T 169.254.169.254:65374 -> my.pub.ip.addr:3000 [S]
#
T my.pub.ip.addr:3000 -> 169.254.169.254:65374 [AS]
#
T 169.254.169.254:65374 -> my.pub.ip.addr:3000 [A]
#
T 169.254.169.254:65374 -> my.pub.ip.addr:3000 [AP]
GET /health.txt HTTP/1.1.
Host: my.pub.ip.addr:3000.
.
#
T my.pub.ip.addr:3000 -> 169.254.169.254:65374 [A]
#
T my.pub.ip.addr:3000 -> 169.254.169.254:65374 [AP]
HTTP/1.1 200 OK.
X-Powered-By: NitroPCR.
Accept-Ranges: bytes.
Date: Fri, 14 Nov 2014 20:00:40 GMT.
Cache-Control: public, max-age=86400.
Last-Modified: Thu, 24 Jul 2014 17:58:46 GMT.
ETag: W/"2198506076".
Content-Type: text/plain; charset=UTF-8.
Content-Length: 13.
Connection: keep-alive.
.
#
T 169.254.169.254:65374 -> my.pub.ip.addr:3000 [AR]
But on the instance GCE claims is unhealthy, I see this:
T 169.254.169.254:61179 -> my.pub.ip.addr:3000 [S]
#
T 169.254.169.254:61179 -> my.pub.ip.addr:3000 [S]
#
T 169.254.169.254:61180 -> my.pub.ip.addr:3000 [S]
#
T 169.254.169.254:61180 -> my.pub.ip.addr:3000 [S]
#
T 169.254.169.254:61180 -> my.pub.ip.addr:3000 [S]
But if I curl the same file from my healthy instance > unhealthy instance, my 'unhealthy' instance responds fine.
I got this back working, after making contact with the Google Compute Engine team. There is a service process on a GCE VM that needs to run on boot, and continue running while the VM is alive. The process is named google-address-manager. It should run at runlevels 0-6. For some reason this service stopped and will not start when one of my VMs boots/reboots. Starting the service manually worked. Here are the steps we went through to determine what was wrong: (This is a Debian VM)
sudo ip route list table all
This will display your route table. In the table, there should be a route to your Load Balancer Public IP:
local lb.pub.ip.addr dev eth0 table local proto 66 scope host
If there is not, check that google-address-manager is running:
sudo service google-address-manager status
If it not running, start it:
sudo service google-address-manager start
If it starts ok, check your route table, and you should now have a route to your load balancer IP. You can also manually add this route:
sudo /sbin/ip route add to local lb.pub.ip.addr/32 dev eth0 proto 66
We have still not resolved why the address manager stopped and does not start on boot, but at least the LB Pool is healthy
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With