I am trying to teach myself Python's async functionality. To do so I have built an async web scraper. I would like to limit the total number of connections I have open at once to be a good citizen on servers. I know that semaphore's are a good solution, and the asyncio library has a semaphore class built in. My issue is that Python complains when using yield from
in an async
function as you are combining yield
and await
syntax. Below is the exact syntax I am using...
import asyncio
import aiohttp
sema = asyncio.BoundedSemaphore(5)
async def get_page_text(url):
with (yield from sema):
try:
resp = await aiohttp.request('GET', url)
if resp.status == 200:
ret_val = await resp.text()
except:
raise ValueError
finally:
await resp.release()
return ret_val
Raising this Exception:
File "<ipython-input-3-9b9bdb963407>", line 14
with (yield from sema):
^
SyntaxError: 'yield from' inside async function
Some possible solution I can think of...
@asyncio.coroutine
decoratorI am very new to Python's async functionality so I could be missing something obvious.
From the asyncio docs: A semaphore manages an internal counter which is decremented by each acquire() call and incremented by each release() call. The counter can never go below zero; when acquire() finds that it is zero, it blocks, waiting until some task calls release() .
A Semaphore can be released more times than it's acquired, and that will raise its counter above the starting value. A BoundedSemaphore can't be raised above the starting value.
Asyncio vs threading: Async runs one block of code at a time while threading just one line of code at a time. With async, we have better control of when the execution is given to other block of code but we have to release the execution ourselves.
Simply speaking, thread-safe means that it is safe when more than one thread access the same resource and I know Asyncio use a single thread fundamentally. However, more than one Asyncio Task could access a resource multiple time at a time like multi-threading .
You can use the async with
statement to get an asynchronous context manager:
#!/usr/local/bin/python3.5
import asyncio
from aiohttp import ClientSession
sema = asyncio.BoundedSemaphore(5)
async def hello(url):
async with ClientSession() as session:
async with sema, session.get(url) as response:
response = await response.read()
print(response)
loop = asyncio.get_event_loop()
loop.run_until_complete(hello("http://httpbin.org/headers"))
Example taken from here. The page is also a good primer for asyncio
and aiohttp
in general.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With