If I get an initial "Name or service not known" (EAI_NONAME), the next call to getaddrinfo() seems to go straight to the dns instead of checking the cache first (nscd logs show no lookup attempts, tcpdump shows traffic to DNS server). If the first call succeeds in getting an address, from then on, all getaddrinfo() calls go to nscd first, as expected.
I'm compiling against glibc-2.13 for arm linux. In my rc.d, nscd is started before my daemon. nscd is set to disallow shared caches, and maintain a host cache. I am using the nscd from busybox (0.47). nsswitch.conf is set so host checks cache/files/dns. hosts.conf is set to check files/bind.
My daemon is calling getaddrinfo().
I have debug logs for nscd running, and they show that the client started to read the DNS response closes with a "Broken Pipe" error.
After that it will show GAI attempts from other daemons attempting to use the cache (so I know it's not nscd locked up or anything), but the daemon that got EAI_NONAME never again contacts nscd to do a cache lookup.
If I restart the daemon, I get the same behaviour, if the first DNS query times out again.
Is there something in glibc that is invalidating my daemon's link to the cache? Is there a way to reconnect my daemon to the cache without restarting it (similar to forcing a resolv.conf re-load via res_init())?
To reduce the load on your DNS infrastructure, it's highly recommended to use the Name Service Caching Daemon (NSCD) on cluster nodes running Linux. This daemon will cache host, user, and group lookups and provide better resolution performance, and reduced load on DNS infrastructure.
This is the Name Service Cache Daemon. It takes care of group and password lookups for running programs and then caches the lookup results for the next query for services that can experience slowness in picking up changes such as NIS or LDAP.
nscd is already planned for deprecation in Fedora 34. The functionality it currently provides can be achieved by using systemd-resolved for DNS caching and the sssd daemon for everything else.
DESCRIPTION. Nscd caches libc-issued requests to the Name Service. If retrieving NSS data is fairly expensive, nscd is able to speed up consecutive access to the same data dramatically and increase overall system performance. Nscd should be run at boot time by /etc/init.
As alk mentions in his comment, retrying getaddrinfo()
more than 100 times should force a nscd query.
To understand why, let us take a quick peek into the flow of execution inside getaddrinfo().
getaddrinfo()
calls gaih_inet.
gaih_inet()
performs the following operations on __nss_not_use_nscd_hosts
:
Checks whether it exceeds the retry count NSS_NSCD_RETRY
?
It attempts to query nscd ONLY if both the above conditions are satisfied.
Also upon attempting a query to nscd, the count is immediately reset to zero
thereby ignoring nscd for the next NSS_NSCD_RETRY
times getaddrinfo()
is called.
Also __nss_not_use_nscd_hosts
is modified internally by nscd in the following places
nscd/nscd_gethst_r.c lines 178, 189
-- reset to 1
.
nscd/nscd_getai.c lines 89, 164
-- reset to 1
.
nss/nsswitch.c, line 709
-- set to -1
i.e. Disable nscd.
Based on the above, it can be concluded that
getaddrinfo()
does NOT query nscd every single time.
The internal state of nscd (determined by __nss_not_use_nscd_hosts
)
decides if getaddrinfo()
ends up calling nscd or not.
To really force one's way around the 100 retry limitation, one could modify
NSS_NSCD_RETRY
and rebuild libc to deviate from the standard behaviour. But i am not really sure if this will NOT result in any other unintended regressions.
Reference : Patch that introduced the __nss_not_use_nscd_hosts
logic in getaddrinfo()
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With