aiohttp concurrent GET requests lead to ClientConnectorError(8, 'nodename nor servname provided, or not known')

I am stumped by a problem seemingly related to asyncio + aiohttp whereby, when sending a large number of concurrent GET requests, over 85% of the requests raise an aiohttp.client_exceptions.ClientConnectorError exception that ultimately stems from

socket.gaierror(8, 'nodename nor servname provided, or not known')

while sending single GET requests or doing the underlying DNS resolution on the host/port does not raise this exception.

While in my real code I'm doing a good amount of customization such as using a custom TCPConnector instance, I can reproduce the issue using just the "default" aiohttp class instances & arguments, exactly as below.

I've followed the traceback and the root of the exception is related to DNS resolution. It comes from the _create_direct_connection method of aiohttp.TCPConnector, which calls ._resolve_host().

I have also tried:

  • Using (and not using) aiodns
  • sudo killall -HUP mDNSResponder
  • Using family=socket.AF_INET as an argument to TCPConnector (though I am fairly sure this is used by aiodns anyway). This uses 2 rather than the default int 0 to that param
  • With ssl=True and ssl=False

All to no avail.

Full code to reproduce is below. The input URLs are at https://gist.github.com/bsolomon1124/fc625b624dd26ad9b5c39ccb9e230f5a.

import asyncio
import itertools

import aiohttp
import aiohttp.client_exceptions

from yarl import URL

ua = itertools.cycle(
        "Mozilla/5.0 (X11; Linux i686; rv:64.0) Gecko/20100101 Firefox/64.0",
        "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.10; rv:62.0) Gecko/20100101 Firefox/62.0",
        "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.13; ko; rv:1.9.1b2) Gecko/20081201 Firefox/60.0",
        "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36"

async def get(url, session) -> str:
    async with await session.request(
        headers={'User-Agent': next(ua)},
    ) as resp:
        text = await resp.text(encoding="utf-8", errors="replace")
        print("Got text for URL", url)
        return text

async def bulk_get(urls) -> list:
    async with aiohttp.ClientSession() as session:
        htmls = await asyncio.gather(
                get(url=url, session=session)
                for url in urls
        return htmls

# See https://gist.github.com/bsolomon1124/fc625b624dd26ad9b5c39ccb9e230f5a
with open("/path/to/urls.txt") as f:
    urls = tuple(URL(i.strip()) for i in f)

res = asyncio.run(bulk_get(urls))  # urls: Tuple[yarl.URL]

c = 0
for i in res:
    if isinstance(i, aiohttp.client_exceptions.ClientConnectorError):
        c += 1

print(c)  # 21205 !!!!! (85% failure rate)
print(len(urls))  # 24934

Printing each exception string from res looks like:

Cannot connect to host sigmainvestments.com:80 ssl:False [nodename nor servname provided, or not known]
Cannot connect to host giaoducthoidai.vn:443 ssl:False [nodename nor servname provided, or not known]
Cannot connect to host chauxuannguyen.org:80 ssl:False [nodename nor servname provided, or not known]
Cannot connect to host www.baohomnay.com:443 ssl:False [nodename nor servname provided, or not known]
Cannot connect to host www.soundofhope.org:80 ssl:False [nodename nor servname provided, or not known]
# And so on...

What's frustrating is that I can ping these hosts with no problem and even call the underlying ._resolve_host():


 [~/] $ ping -c 5 www.hongkongfp.com
PING www.hongkongfp.com ( 56 data bytes
64 bytes from icmp_seq=0 ttl=56 time=11.667 ms
64 bytes from icmp_seq=1 ttl=56 time=12.169 ms
64 bytes from icmp_seq=2 ttl=56 time=12.135 ms
64 bytes from icmp_seq=3 ttl=56 time=12.235 ms
64 bytes from icmp_seq=4 ttl=56 time=14.252 ms

--- www.hongkongfp.com ping statistics ---
5 packets transmitted, 5 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 11.667/12.492/14.252/0.903 ms


In [1]: import asyncio 
   ...: from aiohttp.connector import TCPConnector 
   ...: from clipslabapp.ratemgr import default_aiohttp_tcpconnector 
   ...: async def main(): 
   ...:     conn = default_aiohttp_tcpconnector() 
   ...:     i = await asyncio.create_task(conn._resolve_host(host='www.hongkongfp.com', port=443)) 
   ...:     return i 
   ...: i = asyncio.run(main())                                                                                                                               

In [2]: i                                                                                                                                                     
[{'hostname': 'www.hongkongfp.com',
  'host': '',
  'port': 443,
  'family': <AddressFamily.AF_INET: 2>,
  'proto': 6,
  'flags': <AddressInfo.AI_NUMERICHOST: 4>},
 {'hostname': 'www.hongkongfp.com',
  'host': '',
  'port': 443,
  'family': <AddressFamily.AF_INET: 2>,
  'proto': 6,
  'flags': <AddressInfo.AI_NUMERICHOST: 4>}]

My setup:

  • Python 3.7.1
  • aiohttp 3.5.4
  • Occurs on Mac OSX High Sierra and Ubuntu 18.04

Information on the exception itself:

The exception is aiohttp.client_exceptions.ClientConnectorError, which wraps socket.gaierror as the underlying OSError.

Since I have return_exceptions=True in asyncio.gather(), I can get the exception instances themselves for inspection. Here is one example:

In [18]: i
                                               'nodename nor servname provided, or not known')

In [19]: i.host, i.port
Out[19]: ('www.hongkongfp.com', 443)

In [20]: i._conn_key
Out[20]: ConnectionKey(host='www.hongkongfp.com', port=443, is_ssl=True, ssl=False, proxy=None, proxy_auth=None, proxy_headers_hash=None)

In [21]: i._os_error
Out[21]: socket.gaierror(8, 'nodename nor servname provided, or not known')

In [22]: raise i.with_traceback(i.__traceback__)
gaierror                                  Traceback (most recent call last)
~/Scripts/python/projects/clab/lib/python3.7/site-packages/aiohttp/connector.py in _create_direct_connection(self, req, traces, timeout, client_error)
    954                 port,
--> 955                 traces=traces), loop=self._loop)
    956         except OSError as exc:

~/Scripts/python/projects/clab/lib/python3.7/site-packages/aiohttp/connector.py in _resolve_host(self, host, port, traces)
    824                 addrs = await \
--> 825                     self._resolver.resolve(host, port, family=self._family)
    826                 if traces:

~/Scripts/python/projects/clab/lib/python3.7/site-packages/aiohttp/resolver.py in resolve(self, host, port, family)
     29         infos = await self._loop.getaddrinfo(
---> 30             host, port, type=socket.SOCK_STREAM, family=family)

/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/asyncio/base_events.py in getaddrinfo(self, host, port, family, type, proto, flags)
    772         return await self.run_in_executor(
--> 773             None, getaddr_func, host, port, family, type, proto, flags)

/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/concurrent/futures/thread.py in run(self)
     56         try:
---> 57             result = self.fn(*self.args, **self.kwargs)
     58         except BaseException as exc:

/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/socket.py in getaddrinfo(host, port, family, type, proto, flags)
    747     addrlist = []
--> 748     for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
    749         af, socktype, proto, canonname, sa = res

gaierror: [Errno 8] nodename nor servname provided, or not known

The above exception was the direct cause of the following exception:

ClientConnectorError                      Traceback (most recent call last)
<ipython-input-22-72402d8c3b31> in <module>
----> 1 raise i.with_traceback(i.__traceback__)

<ipython-input-1-2bc0f5172de7> in get(url, session)
     19         raise_for_status=True,
     20         headers={'User-Agent': next(ua)},
---> 21         ssl=False
     22     ) as resp:
     23         return await resp.text(encoding="utf-8", errors="replace")

~/Scripts/python/projects/clab/lib/python3.7/site-packages/aiohttp/client.py in _request(self, method, str_or_url, params, data, json, cookies, headers, skip_auto_headers, auth, allow_redirects, max_redirects, compress, chunked, expect100, raise_for_status, read_until_eof, proxy, proxy_auth, timeout, verify_ssl, fingerprint, ssl_context, ssl, proxy_headers, trace_request_ctx)
    474                                 req,
    475                                 traces=traces,
--> 476                                 timeout=real_timeout
    477                             )
    478                     except asyncio.TimeoutError as exc:

~/Scripts/python/projects/clab/lib/python3.7/site-packages/aiohttp/connector.py in connect(self, req, traces, timeout)
    521             try:
--> 522                 proto = await self._create_connection(req, traces, timeout)
    523                 if self._closed:
    524                     proto.close()

~/Scripts/python/projects/clab/lib/python3.7/site-packages/aiohttp/connector.py in _create_connection(self, req, traces, timeout)
    852         else:
    853             _, proto = await self._create_direct_connection(
--> 854                 req, traces, timeout)
    856         return proto

~/Scripts/python/projects/clab/lib/python3.7/site-packages/aiohttp/connector.py in _create_direct_connection(self, req, traces, timeout, client_error)
    957             # in case of proxy it is not ClientProxyConnectionError
    958             # it is problem of resolving proxy ip itself
--> 959             raise ClientConnectorError(req.connection_key, exc) from exc
    961         last_exc = None  # type: Optional[Exception]

ClientConnectorError: Cannot connect to host www.hongkongfp.com:443 ssl:False [nodename nor servname provided, or not known

Why do I not think this is a problem with DNS resolution at the OS level itself?

I can successfully ping the IP address of my ISP’s DNS Servers, which are given in (Mac OSX) System Preferences > Network > DNS:

 [~/] $ ping -c 2
PING ( 56 data bytes
64 bytes from icmp_seq=0 ttl=57 time=16.478 ms
64 bytes from icmp_seq=1 ttl=57 time=21.042 ms

--- ping statistics ---
2 packets transmitted, 2 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 16.478/18.760/21.042/2.282 ms
 [~/] $ ping -c 2
PING ( 56 data bytes
64 bytes from icmp_seq=0 ttl=54 time=33.904 ms
64 bytes from icmp_seq=1 ttl=54 time=32.788 ms

--- ping statistics ---
2 packets transmitted, 2 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 32.788/33.346/33.904/0.558 ms

 [~/] $ ping6 -c 2 2001:558:feed::1
PING6(56=40+8+8 bytes) 2601:14d:8b00:7d0:6587:7cfc:e2cc:82a0 --> 2001:558:feed::1
16 bytes from 2001:558:feed::1, icmp_seq=0 hlim=57 time=14.927 ms
16 bytes from 2001:558:feed::1, icmp_seq=1 hlim=57 time=14.585 ms

--- 2001:558:feed::1 ping6 statistics ---
2 packets transmitted, 2 packets received, 0.0% packet loss
round-trip min/avg/max/std-dev = 14.585/14.756/14.927/0.171 ms
 [~/] $ ping6 -c 2 2001:558:feed::2
PING6(56=40+8+8 bytes) 2601:14d:8b00:7d0:6587:7cfc:e2cc:82a0 --> 2001:558:feed::2
16 bytes from 2001:558:feed::2, icmp_seq=0 hlim=54 time=12.694 ms
16 bytes from 2001:558:feed::2, icmp_seq=1 hlim=54 time=11.555 ms

--- 2001:558:feed::2 ping6 statistics ---
2 packets transmitted, 2 packets received, 0.0% packet loss
round-trip min/avg/max/std-dev = 11.555/12.125/12.694/0.569 ms
1 Answers

After some further investigation, this issue does not appear to be directly caused by aiohttp/asyncio but rather limitations/limits stemming from both:

  • The capacity/rate-limiting of your DNS Servers
  • The max number of open files at the system level.

Firstly, for those looking to get some beefed-up DNS servers (I will probably not go that route), the big-name options seem to be:

  • (Cloudflare)
  • (Google Public DNS)
  • Amazon Route 53

(Good intro to DNS for those like me for whom network concepts are lacking.)

The first thing that I did was to run the above on a beefed-up AWS EC2 instance - h1.16xlarge running Ubuntu which is IO optimized. I can't say this in itself helped, but it certainly cannot hurt. I'm not too familiar with the default DNS server used by an EC2 instance, but the OSError with errno == 8 from above went away when replicating the above script.

However, that presented a new exception in its place, OSError with code 24, "Too many open files." My hotfix solution (not arguing this is the most sustainable or safest) was to increase the max file limits. I did this via:

sudo vim /etc/security/limits.conf
# Add these lines
root    soft    nofile  100000
root    hard    nofile  100000
ubuntu    soft    nofile  100000
ubuntu    hard    nofile  100000

sudo vim /etc/sysctl.conf
# Add this line
fs.file-max = 2097152

sudo sysctl -p

sudo vim /etc/pam.d/commmon_session
# Add this line
session required pam_limits.so

sudo reboot

I am admittedly feeling around in the dark, but coupling this with asyncio.Semaphore(1024) (example here) led to exactly 0 of the either 2 exceptions above being raised:

# Then call this from bulk_get with asyncio.Sempahore(n)
async def bounded_get(sem, url, session) -> str:
    async with sem:
        return await get(url, session)

Of the ~25k input URLs, only ~100 GET requests returned exceptions, mainly due to those websites being legitimately broken, with total time to completion coming in within a few minutes, acceptable in my opinion.

