Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python asyncio/aiohttp: ValueError: too many file descriptors in select() on Windows

Note: Future readers be aware, this question was old, formatted and programmed in a rush. The answer given may be useful, but the question and code probably not.

Hello everyone,

I'm having trouble understanding asyncio and aiohttp and making both work together. Because I don't understand what I'm doing I've run into a problem that I have no idea how to solve.

I'm using Windows 10 64 bits.

The following code returns a list of pages that do not contain "html" in the Content-Type header. It's implemented using asyncio.

import asyncio
import aiohttp

MAXitems = 30

async def getHeaders(url, session, sema):
    async with session:
        async with sema:
            try:
                async with session.head(url) as response:
                    try:
                        if "html" in response.headers["Content-Type"]:
                            return url, True
                        else:
                            return url, False
                    except:
                        return url, False
            except:
                return url, False


def check_urls_without_html(list_of_urls):
    headers_without_html = set()
    while(len(list_of_urls) != 0):
        blockurls = []
        print(len(list_of_urls))
        items = 0
        for num in range(0, len(list_of_urls)):
            if num < MAXitems:
                blockurls.append(list_of_urls[num - items])
                list_of_urls.remove(list_of_urls[num - items])
                items += 1
        loop = asyncio.get_event_loop()
        semaphoreHeaders = asyncio.Semaphore(50)
        session = aiohttp.ClientSession()
        data = loop.run_until_complete(asyncio.gather(*(getHeaders(url, session, semaphoreHeaders) for url in blockurls)))
        for header in data:
            if not header[1]:
                headers_without_html.add(header)
    return headers_without_html


list_of_urls= ['http://www.google.com', 'http://www.reddit.com']
headers_without_html =  check_urls_without_html(list_of_urls)

for header in headers_without_html:
    print(header[0])

When I run it with too many URLs (ie 2000) sometimes it returns an error like like this one:

data = loop.run_until_complete(asyncio.gather(*(getHeaders(url, session, semaphoreHeaders) for url in blockurls)))
  File "USER\AppData\Local\Programs\Python\Python36-32\lib\asyncio\base_events.py", line 454, in run_until_complete
    self.run_forever()
  File "USER\AppData\Local\Programs\Python\Python36-32\lib\asyncio\base_events.py", line 421, in run_forever
    self._run_once()
  File "USER\AppData\Local\Programs\Python\Python36-32\lib\asyncio\base_events.py", line 1390, in _run_once
    event_list = self._selector.select(timeout)
  File "USER\AppData\Local\Programs\Python\Python36-32\lib\selectors.py", line 323, in select
    r, w, _ = self._select(self._readers, self._writers, [], timeout)
  File "USER\AppData\Local\Programs\Python\Python36-32\lib\selectors.py", line 314, in _select
    r, w, x = select.select(r, w, w, timeout)
ValueError: too many file descriptors in select()

I've read that problem arises from a Windows' restriction. I've also read there is not much that can be done about it, other than trying to use less file descriptors.

I've seen people push thousands of requests with asyncio and aiohttp but even with my chuncking I can't push 30-50 without getting this error.

Is there something fundamentally wrong with my code or is it an inherent problem with Windows? Can it be fixed? Can one increase the limit on the maximum number of allowed file descriptors in select?

like image 453
Josep Avatar asked Dec 06 '17 13:12

Josep


3 Answers

By default Windows can use only 64 sockets in asyncio loop. This is a limitation of underlying select() API call.

To increase the limit please use ProactorEventLoop, you can use the code below. See the full docs here here.

if sys.platform == 'win32':
    loop = asyncio.ProactorEventLoop()
    asyncio.set_event_loop(loop)

Another solution is to limit the overall concurrency using a sempahore, see the answer provided here. For example, when doing 2000 API calls you might want not want too many parallel open requests (they might timeout / more difficult to see the individual calling times). This will give you

await gather_with_concurrency(100, *my_coroutines)
like image 105
Andrew Svetlov Avatar answered Nov 15 '22 02:11

Andrew Svetlov


I'm having the same problem. Not 100% sure that this is guaranteed to work, but try replacing this:

session = aiohttp.ClientSession()

with this:

connector = aiohttp.TCPConnector(limit=60)
session = aiohttp.ClientSession(connector=connector)

By default limit is set to 100 (docs), meaning that the client can have 100 simultaneous connections open at a time. As Andrew mentioned, Windows can only have 64 sockets open at a time, so we provide a number lower than 64 instead.

like image 22
James Ko Avatar answered Nov 15 '22 02:11

James Ko


#Add to call area
loop = asyncio.ProactorEventLoop()
asyncio.set_event_loop(loop)
like image 1
Michael Kor Avatar answered Nov 15 '22 02:11

Michael Kor