Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use python to schedule tasks in a Django application

I'm new to Django and web frameworks in general. I have an app that is all set up and works perfectly fine on my localhost.

The program uses Twitter's API to gather a bunch of tweets and displays them to the user. The only problem is I need my python program that gets the tweets to be run in the background every-so-often.

This is where using the schedule module would make sense, but once I start the local server it never runs the schedule functions. I tried reading up on cronjobs and just can't seem to get it to work. How can I get Django to run a specific python file periodically?

like image 831
awoldt Avatar asked Jun 22 '20 23:06

awoldt


Video Answer


2 Answers

I've encountered a similar situation and have had a lot of success with django-apscheduler. It is all self-contained - it runs with the Django server and jobs are tracked in the Django database, so you don't have to configure any external cron jobs or anything to call a script.

Below is a basic way to get up and running quickly, but the links at the end of this post have far more documentation and details as well as more advanced options.

Install with pip install django-apscheduler then add it to your INSTALLED_APPS:

INSTALLED_APPS = [
    ...
    'django_apscheduler',
    ...
]

Once installed, make sure to run makemigrations and migrate on the database.

Create a scheduler python package (a folder in your app directory named scheduler with a blank __init__.py in it). Then, in there, create a file named scheduler.py, which should look something like this:

from apscheduler.schedulers.background import BackgroundScheduler
from django_apscheduler.jobstores import DjangoJobStore, register_events
from django.utils import timezone
from django_apscheduler.models import DjangoJobExecution
import sys

# This is the function you want to schedule - add as many as you want and then register them in the start() function below
def deactivate_expired_accounts():
    today = timezone.now()
    ...
    # get accounts, expire them, etc.
    ...


def start():
    scheduler = BackgroundScheduler()
    scheduler.add_jobstore(DjangoJobStore(), "default")
    # run this job every 24 hours
    scheduler.add_job(deactivate_expired_accounts, 'interval', hours=24, name='clean_accounts', jobstore='default')
    register_events(scheduler)
    scheduler.start()
    print("Scheduler started...", file=sys.stdout)

In your apps.py file (create it if it doesn't exist):

from django.apps import AppConfig

    class AppNameConfig(AppConfig):
        name = 'your_app_name'
        def ready(self):
            from scheduler import scheduler
            scheduler.start()

A word of caution: when using this with DEBUG = True in your settings.py file, run the development server with the --noreload flag set (i.e. python manage.py runserver localhost:8000 --noreload), otherwise the scheduled tasks will start and run twice.

Also, django-apscheduler does not allow you to pass any parameters to the functions that are scheduled to be run. It is a limitation, but I've never had a problem with it. You can load them from some external source, like the Django database, if you really need to.

You can use all the standard Django libraries, packages and functions inside the apscheduler tasks (functions). For example, to query models, call external APIs, parse responses/data, etc. etc. It's seamlessly integrated.

Some additional links:

  • Project repository: https://github.com/jarekwg/django-apscheduler
  • More documentation: https://medium.com/@mrgrantanderson/replacing-cron-and-running-background-tasks-in-django-using-apscheduler-and-django-apscheduler-d562646c062e
like image 198
Michael Hawkins Avatar answered Nov 15 '22 00:11

Michael Hawkins


Another library you can use is django-q

Django Q is a native Django task queue, scheduler and worker application using Python multiprocessing. 1

Like django-appscheduler it can run and track jobs using the database Django is attached to. Or, it can use full-blown brokers like Reddis.

The only problem is I need my python program that gets the tweets to be run in the background every-so-often.

That sounds like a scheduler. (Django-q also has a tasks feature, that can be triggered by events rather than being run on a schedule. The scheduler just sits on top of the task feature, and triggers tasks at a defined schedule.)

There's three parts to this with django-q:

  1. Install Django-q and configure it;
  2. Define a task function (or set of functions) that you want to fetch the tweets;
  3. Define a schedule that runs the tasks;
  4. Run the django-q cluster that'll process the schedule and tasks.

Install django-q

pip install django-q

Configure it as an installed app in Django settings.py (add it to the install apps list):

INSTALLED_APPS = [
    ...
    'django_q',
    ...
]

Then it needs it's own configuration settings.py (this is a configuration to use the database as the broker rather than reddis or something external to Django.)

# Settings for Django-Q
# https://mattsegal.dev/simple-scheduled-tasks.html

Q_CLUSTER = {
    'orm': 'default',  # should use django's ORM and database as a broker.
    'workers': 4,
    'timeout': 30,
    'retry': 60,
    'queue_limit': 50,
    'bulk': 10,
}

You'll then need to run migrations on the database to create the tables django-q uses:

python manage.py migrate

(This will create a bunch of schedule and task related tables in the database. They can be viewed and manipulated through the Django admin panel.)

Define a task function

Then create a new file for the tasks you want to run:

# app/tasks.py
def fetch_tweets():
    pass  # do whatever logic you want here

Define a task schedule

We need to add into the database the schedule to run the tasks.

python manage.py shell
from django_q.models import Schedule
Schedule.objects.create(
    func='app.tasks.fetch_tweets',  # module and func to run
    minutes=5,  # run every 5 minutes
    repeats=-1  # keep repeating, repeat forever
)

You don't have to do this through the shell. You can do this in a module of python code, etc. But you probably only need to create the schedule once.

Run the cluster

Once that's all done, you need to run the cluster that will process the schedule. Otherwise, without running the cluster, the schedule and tasks will never be processed. The call to qcluster is a blocking call. So normally you want to run it in a separate window or process from the Django server process.

python manage.py qcluster

When it runs you'll see output like:

09:33:00 [Q] INFO Q Cluster fruit-november-wisconsin-hawaii starting.
09:33:00 [Q] INFO Process-1:1 ready for work at 11
09:33:00 [Q] INFO Process-1:2 ready for work at 12
09:33:00 [Q] INFO Process-1:3 ready for work at 13
09:33:00 [Q] INFO Process-1:4 ready for work at 14
09:33:00 [Q] INFO Process-1:5 monitoring at 15
09:33:00 [Q] INFO Process-1 guarding cluster fruit-november-wisconsin-hawaii
09:33:00 [Q] INFO Q Cluster fruit-november-wisconsin-hawaii running.

There's also some example documentation that's pretty useful if you want to see how to hook up tasks to reports or emails or signals etc.

like image 45
Donal Avatar answered Nov 15 '22 00:11

Donal