Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compute Engine VM lost network connectivity

We have a VM in Compute Engine running CentOs 8. This VM is running for a long time and is never restarted. Last night, suddenly we lost connectivity to the VM through internal as well as external IPs. SSH was also not being possible.

On the serial port, we observed below logs:

Apr 29 15:53:18 <vm-name> google_osconfig_agent[1215]: default

Apr 29 16:05:18 <vm-name> google_osconfig_agent[1215]: default

Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info>  [1619712666.3120] dhcp4 (eth0): option dhcp_lease_time      => '3600'

Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info>  [1619712666.3124] dhcp4 (eth0): option domain_name          => 'us-central1-a.c.<project-name>.internal'

Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info>  [1619712666.3124] dhcp4 (eth0): option domain_name_servers  => '169.254.169.254'

Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info>  [1619712666.3124] dhcp4 (eth0): option domain_search        => 'us-central1-a.c.<project-name>.internal c.<project-name>.internal google.internal'

Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info>  [1619712666.3124] dhcp4 (eth0): option expiry               => '1619716266'

Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info>  [1619712666.3124] dhcp4 (eth0): option host_name            => '<vm-name>.us-central1-a.c.<project-name>.internal'

Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info>  [1619712666.3124] dhcp4 (eth0): option interface_mtu        => '1460'

Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info>  [1619712666.3124] dhcp4 (eth0): option ip_address           => '10.128.0.4'

Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info>  [1619712666.3124] dhcp4 (eth0): option next_server          => '10.128.0.1'

Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info>  [1619712666.3124] dhcp4 (eth0): option ntp_servers          => '169.254.169.254'

Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info>  [1619712666.3124] dhcp4 (eth0): option requested_broadcast_address => '1'

Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info>  [1619712666.3124] dhcp4 (eth0): option requested_domain_name => '1'

Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info>  [1619712666.3125] dhcp4 (eth0): option requested_domain_name_servers => '1'

Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info>  [1619712666.3125] dhcp4 (eth0): option requested_domain_search => '1'

Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info>  [1619712666.3125] dhcp4 (eth0): option requested_host_name  => '1'

Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info>  [1619712666.3125] dhcp4 (eth0): option requested_interface_mtu => '1'

Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info>  [1619712666.3125] dhcp4 (eth0): option requested_ms_classless_static_routes => '1'

Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info>  [1619712666.3125] dhcp4 (eth0): option requested_nis_domain => '1'

Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info>  [1619712666.3125] dhcp4 (eth0): option requested_nis_servers => '1'

Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info>  [1619712666.3125] dhcp4 (eth0): option requested_ntp_servers => '1'

Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info>  [1619712666.3125] dhcp4 (eth0): option requested_rfc3442_classless_static_routes => '1'

Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info>  [1619712666.3125] dhcp4 (eth0): option requested_root_path  => '1'

Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info>  [1619712666.3126] dhcp4 (eth0): option requested_routers    => '1'

Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info>  [1619712666.3126] dhcp4 (eth0): option requested_static_routes => '1'

Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info>  [1619712666.3126] dhcp4 (eth0): option requested_subnet_mask => '1'

Apr 29 16:11:06 <vm-name> dbus-daemon[827]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service' requested by ':1.8' (uid=0 pid=907 comm="/usr/sbin/NetworkManager --no-daemon " label="system_u:system_r:NetworkManager_t:s0")

Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info>  [1619712666.3126] dhcp4 (eth0): option requested_time_offset => '1'

Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info>  [1619712666.3126] dhcp4 (eth0): option requested_wpad       => '1'

Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info>  [1619712666.3126] dhcp4 (eth0): option rfc3442_classless_static_routes => '10.128.0.1/32 0.0.0.0 0.0.0.0/0 10.128.0.1'

Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info>  [1619712666.3126] dhcp4 (eth0): option routers              => '10.128.0.1'

Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info>  [1619712666.3126] dhcp4 (eth0): option subnet_mask          => '255.255.255.255'

Apr 29 16:11:06 <vm-name> NetworkManager[907]: <info>  [1619712666.3126] dhcp4 (eth0): state changed extended -> extended

Apr 29 16:11:06 <vm-name> systemd[1]: Starting Network Manager Script Dispatcher Service...

Apr 29 16:11:06 <vm-name> dbus-daemon[827]: [system] Successfully activated service 'org.freedesktop.nm_dispatcher'

Apr 29 16:11:06 <vm-name> systemd[1]: Started Network Manager Script Dispatcher Service.

Apr 29 16:15:18 <vm-name> google_osconfig_agent[1215]: default

Apr 29 16:29:30 <vm-name> GCEGuestAgent[1269]: 2021-04-29T16:28:35.1302Z GCEGuestAgent Error main.go:181: Error watching metadata: Get http://metadata.google.internal/computeMetadata/v1//?recursive=true&alt=json&wait_for_change=true&timeout_sec=60&last_etag=4ac15b8179731d72: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

Apr 29 16:29:31 <vm-name> OSConfigAgent[1215]: 2021-04-29T16:29:03.9464Z OSConfigAgent Error main.go:189: Get http://169.254.169.254/computeMetadata/v1/?recursive=true&alt=json&wait_for_change=true&last_etag=4ac15b8179731d72&timeout_sec=60: dial tcp 169.254.169.254:80: i/o timeout

Apr 29 16:29:44 <vm-name> google_osconfig_agent[1215]: default

[16281276.978048] google_guest_agent[1269]: 2021/04/29 16:41:10 logging client: context deadline exceeded

Apr 29 17:11:14 <vm-name> google_guest_agent[1269]: 2021/04/29 16:41:10 logging client: context deadline exceeded

Apr 29 17:23:19 <vm-name> google_osconfig_agent[1215]: 2021/04/29 16:42:42 logging client: context deadline exceeded

Apr 29 17:23:19 <vm-name> google_osconfig_agent[1215]: default

Apr 29 17:23:19 <vm-name> google_osconfig_agent[1215]: 2021/04/29 17:03:31 logging client: context deadline exceeded

Apr 29 17:23:19 <vm-name> google_osconfig_agent[1215]: default

Apr 29 17:27:20 <vm-name> OSConfigAgent[1215]: 2021-04-29T16:41:09.0306Z OSConfigAgent Error main.go:189: Get http://169.254.169.254/computeMetadata/v1/?recursive=true&alt=json&wait_for_change=true&last_etag=4ac15b8179731d72&timeout_sec=60: dial tcp 169.254.169.254:80: i/o timeout

Apr 29 17:27:20 <vm-name> NetworkManager[907]: <info>  [1619717130.6935] dhcp4 (eth0): option dhcp_lease_time      => '3600'

Apr 29 17:27:20 <vm-name> google_osconfig_agent[1215]: 2021/04/29 17:26:00 logging client: context deadline exceeded

Apr 29 17:27:20 <vm-name> OSConfigAgent[1215]: 2021-04-29T17:07:12.7039Z OSConfigAgent Error main.go:189: Get http://169.254.169.254/computeMetadata/v1/?recursive=true&alt=json&wait_for_change=true&last_etag=4ac15b8179731d72&timeout_sec=60: dial tcp 169.254.169.254:80: i/o timeout

Apr 29 17:27:20 <vm-name> OSConfigAgent[1215]: 2021-04-29T17:22:13.2740Z OSConfigAgent Error main.go:189: network error when requesting metadata, make sure your instance has an active network and can reach the metadata server: Get http://169.254.169.254/computeMetadata/v1/?recursive=true&alt=json&wait_for_change=true&last_etag=4ac15b8179731d72&timeout_sec=60: dial tcp 169.254.169.254:80: connect: network is unreachable

Apr 29 17:28:55 <vm-name> NetworkManager[907]: <info>  [1619717199.5681] dhcp4 (eth0): option domain_name          => 'us-central1-a.c.<project-name>.internal'

Apr 29 17:34:48 <vm-name> OSConfigAgent[1215]: 2021-04-29T17:27:54.8816Z OSConfigAgent Error main.go:189: network error when requesting metadata, make sure your instance has an active network and can reach the metadata server: Get http://169.254.169.254/computeMetadata/v1/?recursive=true&alt=json&wait_for_change=true&last_etag=4ac15b8179731d72&timeout_sec=60: dial tcp 169.254.169.254:80: connect: network is unreachable

Apr 29 17:39:16 <vm-name> google_osconfig_agent[1215]: default

Apr 29 17:39:16 <vm-name> google_osconfig_agent[1215]: 2021/04/29 17:36:13 logging client: context deadline exceeded

Apr 29 17:39:16 <vm-name> google_osconfig_agent[1215]: default

Apr 29 17:55:16 <vm-name> NetworkManager[907]: <info>  [1619717287.9321] dhcp4 (eth0): option domain_name_servers  => '169.254.169.254'

Apr 29 17:58:36 <vm-name> OSConfigAgent[1215]: 2021-04-29T17:30:47.0872Z OSConfigAgent Error main.go:189: network error when requesting metadata, make sure your instance has an active network and can reach the metadata server: Get http://169.254.169.254/computeMetadata/v1/?recursive=true&alt=json&wait_for_change=true&last_etag=4ac15b8179731d72&timeout_sec=60: dial tcp 169.254.169.254:80: connect: network is unreachable

Apr 29 18:03:20 <vm-name> google_osconfig_agent[1215]: default

Apr 29 18:03:20 <vm-name> google_osconfig_agent[1215]: 2021/04/29 17:46:22 logging client: context deadline exceeded

Restarting the machine restored the network. We cannot find anything else in the logs, and the monitoring also doesn't show anything suspicious. What could have caused this?

like image 981
Ankur Gupta Avatar asked Mar 03 '26 06:03

Ankur Gupta


1 Answers

This OSConfigAgent Error seems to be the cause of the issue:

Apr 29 17:27:20 <vm-name> OSConfigAgent[1215]: 2021-04-29T17:22:13.2740Z OSConfigAgent Error main.go:189: network error when requesting metadata, make sure your instance has an active network and can reach the metadata server: Get http://169.254.169.254/computeMetadata/v1/?recursive=true&alt=json&wait_for_change=true&last_etag=4ac15b8179731d72&timeout_sec=60: dial tcp 169.254.169.254:80: connect: network is unreachable

According to the GCP documentation:

Google Cloud runs a local metadata server alongside each instance at 169.254.169.254. This server is essential to the operation of the instance, so the instance can access it regardless of any firewall rules that you configure.

Based on that, there must be something interesting to look at within /var/log/messages. It is possible that you find something related to the Network adapter.

like image 88
Lalo P Avatar answered Mar 06 '26 02:03

Lalo P



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!