Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use Tornado with APScheduler?

I am running python's apscheduler and periodically want to do some work POST-ing to some http resources which will involve using tornado's AsyncHttpClient as a scheduled job. Each job will do several POSTs. When each http request responds a callback is then called (I think that Tornado uses a future to accomplish this).

I am concerned with thread-safety here since Apscheduler runs jobs in various threads. I have not been able to find a well explained example of how tornado would best be used across multiple threads in this context.

How can I best use apscheduler with tornado in this manner?

Specific concerns:

  1. Which tornado ioloop to use? The docs say that AsyncHTTPClient "works like magic". Well, magic scares me. Do I need to use AsyncHTTPClient from within the current thread or can I use the main one (it can be specified)?

  2. Are there thread-safety issues with my callback with respect to which ioloop I use?

  3. Not clear to me what happens when a thread completes but there is still a pending callback/future that needs to be called. Are there issues here?

  4. Since apscheduler is run as threads in-process, and python has the GIL, then is it pretty much the same to have one IOLoop from the main thread - as opposed to multiple loops from different threads (with respect to performance)?

like image 403
Rocketman Avatar asked May 05 '13 21:05

Rocketman


People also ask

What is Max_instances in APScheduler?

The max_instances only tells you how many concurrent jobs you can have. APScheduler has three types of triggers: date interval cron. interval and cron repeat forever, date is a one-shot on a given date.

How do I stop APScheduler?

It only stops when you type Ctrl-C from your keyboard or send SIGINT to the process. This scheduler is intended to be used when APScheduler is the only task running in the process. It blocks all other code from running unless the others are running in separated threads.

What is APScheduler in Python?

Advanced Python Scheduler (APScheduler) is a Python library that lets you schedule your Python code to be executed later, either just once or periodically. You can add new jobs or remove old ones on the fly as you please.


1 Answers

  1. All of Tornado's utilities work around Tornado's IOLoop - this includes the AsyncHTTPClient as well. And an IOLoop is not considered thread safe. Therefore, it is not a great idea to be running AsyncHTTPClient from any thread other than the thread running your main IOLoop. For more details on how to use the IOLoop, read this.

  2. If you use tornado.ioloop.IOLoop.instance(), then I suppose you will if your intention is not to add callbacks to the main thread's IOLoop. You can use tornado.ioloop.IOLoop.current() to correctly refer to the right IOLoop instance for the right thread. And you will have to do just too much book keeping to add a callback to a non-main thread's IOLoop from another non-main thread's IOLoop - it will just get too messy.

  3. I don't quite get this. But the way I understand it, there are two scenarios. Either you are talking about a thread with an IOLoop or without an IOLoop. If the thread does not have an IOLoop running, then after whatever the thread does to reach completion, whatever callback has to be executed by the IOLoop in some other thread (perhaps main thread) will be executed. The other scenario is that the thread you are talking about has an IOLoop running. Then the thread won't complete unless you have stopped the IOLoop. And therefore, execution of the callback will really depend on when you stop the IOLoop.

  4. Honestly, I don't see much point of using threads with Tornado. There won't be any performance gain unless you are running on PyPy, which I am not sure if Tornado will play well with (not all the things are known to work on it and honestly I don't know about Tornado as well). You might as well have multiple process of your Tornado app if it is webserver and use Nginx as a proxy and LB. Since you have brought in apscheduler, I would suggest using IOLoop's add_timeout which does pretty much the same thing that you need and it is native to Tornado which play much nicer with it. Callbacks are anyways much difficult to debug. Combine it with Python's threading and you can have a massive mess. If you are ready to consider another option, just move all the async processing out of this process - it will make life much easier. Think of something like Celery for this.

like image 111
vaidik Avatar answered Oct 31 '22 04:10

vaidik