I'm writing a spider to crawl web pages. I know asyncio maybe my best choice. So I use coroutines to process the work asynchronously. Now I scratch my head about how to quit the program by keyboard interrupt. The program could shut down well after all the works have been done. The source code could be run in python 3.5 and is attatched below.
import asyncio
import aiohttp
from contextlib import suppress
class Spider(object):
def __init__(self):
self.max_tasks = 2
self.task_queue = asyncio.Queue(self.max_tasks)
self.loop = asyncio.get_event_loop()
self.counter = 1
def close(self):
for w in self.workers:
w.cancel()
async def fetch(self, url):
try:
async with aiohttp.ClientSession(loop = self.loop) as self.session:
with aiohttp.Timeout(30, loop = self.session.loop):
async with self.session.get(url) as resp:
print('get response from url: %s' % url)
except:
pass
finally:
pass
async def work(self):
while True:
url = await self.task_queue.get()
await self.fetch(url)
self.task_queue.task_done()
def assign_work(self):
print('[*]assigning work...')
url = 'https://www.python.org/'
if self.counter > 10:
return 'done'
for _ in range(self.max_tasks):
self.counter += 1
self.task_queue.put_nowait(url)
async def crawl(self):
self.workers = [self.loop.create_task(self.work()) for _ in range(self.max_tasks)]
while True:
if self.assign_work() == 'done':
break
await self.task_queue.join()
self.close()
def main():
loop = asyncio.get_event_loop()
spider = Spider()
try:
loop.run_until_complete(spider.crawl())
except KeyboardInterrupt:
print ('Interrupt from keyboard')
spider.close()
pending = asyncio.Task.all_tasks()
for w in pending:
w.cancel()
with suppress(asyncio.CancelledError):
loop.run_until_complete(w)
finally:
loop.stop()
loop.run_forever()
loop.close()
if __name__ == '__main__':
main()
But if I press 'Ctrl+C' while it's running, some strange errors may occur. I mean sometimes the program could be shut down by 'Ctrl+C' gracefully. No error message. However, in some cases the program will be still running after pressing 'Ctrl+C' and wouldn't stop until all the works have been done. If I press 'Ctrl+C' at that moment, 'Task was destroyed but it is pending!' would be there.
I have read some topics about asyncio and add some code in main() to close coroutines gracefully. But it not work. Is someone else has the similar problems?
I bet problem happens here:
except:
pass
You should never do such thing. And your situation is one more example of what can happen otherwise.
When you cancel task and await for its cancellation, asyncio.CancelledError
raised inside task and shouldn't be suppressed anywhere inside. Line where you await of your task cancellation should raise this exception, otherwise task will continue execution.
That's why you do
task.cancel()
with suppress(asyncio.CancelledError):
loop.run_until_complete(task) # this line should raise CancelledError,
# otherwise task will continue
to actually cancel task.
Upd:
But I still hardly understand why the original code could quit well by 'Ctrl+C' at a uncertain probability?
It dependence of state of your tasks:
CancelledError
on awaiting and your code will finished normally.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With