Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to graceful shut down coroutines with Ctrl+C?

I'm writing a spider to crawl web pages. I know asyncio maybe my best choice. So I use coroutines to process the work asynchronously. Now I scratch my head about how to quit the program by keyboard interrupt. The program could shut down well after all the works have been done. The source code could be run in python 3.5 and is attatched below.

import asyncio
import aiohttp
from contextlib import suppress

class Spider(object):
    def __init__(self):
        self.max_tasks = 2
        self.task_queue = asyncio.Queue(self.max_tasks)
        self.loop = asyncio.get_event_loop()
        self.counter = 1

    def close(self):
        for w in self.workers:
            w.cancel()

    async def fetch(self, url):
        try:
            async with aiohttp.ClientSession(loop = self.loop) as self.session:
                with aiohttp.Timeout(30, loop = self.session.loop):
                    async with self.session.get(url) as resp:
                        print('get response from url: %s' % url)
        except:
            pass
        finally:
            pass

    async def work(self):
        while True:
            url = await self.task_queue.get()
            await self.fetch(url)
            self.task_queue.task_done()

    def assign_work(self):
        print('[*]assigning work...')
        url = 'https://www.python.org/'
        if self.counter > 10:
            return 'done'
        for _ in range(self.max_tasks):
            self.counter += 1
            self.task_queue.put_nowait(url)

    async def crawl(self):
        self.workers = [self.loop.create_task(self.work()) for _ in range(self.max_tasks)]
        while True:
            if self.assign_work() == 'done':
                break
            await self.task_queue.join()
        self.close()

def main():
    loop = asyncio.get_event_loop()
    spider = Spider()
    try:
        loop.run_until_complete(spider.crawl())
    except KeyboardInterrupt:
        print ('Interrupt from keyboard')
        spider.close()
        pending  = asyncio.Task.all_tasks()
        for w in pending:
            w.cancel()
            with suppress(asyncio.CancelledError):
                loop.run_until_complete(w)
    finally:
        loop.stop()
        loop.run_forever()
        loop.close()

if __name__ == '__main__':
    main()

But if I press 'Ctrl+C' while it's running, some strange errors may occur. I mean sometimes the program could be shut down by 'Ctrl+C' gracefully. No error message. However, in some cases the program will be still running after pressing 'Ctrl+C' and wouldn't stop until all the works have been done. If I press 'Ctrl+C' at that moment, 'Task was destroyed but it is pending!' would be there.

I have read some topics about asyncio and add some code in main() to close coroutines gracefully. But it not work. Is someone else has the similar problems?

like image 439
xssl Avatar asked Oct 17 '22 08:10

xssl


1 Answers

I bet problem happens here:

except:
    pass

You should never do such thing. And your situation is one more example of what can happen otherwise.

When you cancel task and await for its cancellation, asyncio.CancelledError raised inside task and shouldn't be suppressed anywhere inside. Line where you await of your task cancellation should raise this exception, otherwise task will continue execution.

That's why you do

task.cancel()
with suppress(asyncio.CancelledError):
    loop.run_until_complete(task)  # this line should raise CancelledError, 
                                   # otherwise task will continue

to actually cancel task.

Upd:

But I still hardly understand why the original code could quit well by 'Ctrl+C' at a uncertain probability?

It dependence of state of your tasks:

  1. If at the moment you press 'Ctrl+C' all tasks are done, non of them will raise CancelledError on awaiting and your code will finished normally.
  2. If at the moment you press 'Ctrl+C' some tasks are pending, but close to finish their execution, your code will stuck a bit on tasks cancellation and finished when tasks are finished shortly after it.
  3. If at the moment you press 'Ctrl+C' some tasks are pending and far from being finished, your code will stuck trying to cancel these tasks (which can't be done). Another 'Ctrl+C' will interrupt process of cancelling, but tasks wouldn't be cancelled or finished then and you'll get warning 'Task was destroyed but it is pending!'.
like image 101
Mikhail Gerasimov Avatar answered Oct 21 '22 00:10

Mikhail Gerasimov