Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using python async / await with django restframework

I am just upgrading an older project to Python 3.6, and found out that there are these cool new async / await keywords.

My project contains a web crawler, that is not very performant at the moment, and takes about 7 mins to complete. Now, since I have django restframework in place already to access data of my django application, I thought it would be nice to have a REST endpoint where I could start the crawler from remote with a simple POST request.

However, I don't want the client to synchronously wait for the crawler to complete. I just want to straight away send him the message that the crawler has been started and start the crawler in the background.

from rest_framework import status
from rest_framework.decorators import api_view
from rest_framework.response import Response
from django.conf import settings
from mycrawler import tasks

async def update_all_async(deep_crawl=True, season=settings.CURRENT_SEASON, log_to_db=True):
    await tasks.update_all(deep_crawl, season, log_to_db)


@api_view(['POST', 'GET'])
def start(request):
    """
    Start crawling.
    """
    if request.method == 'POST':
        print("Crawler: start {}".format(request))

        deep = request.data.get('deep', False)
        season = request.data.get('season', settings.CURRENT_SEASON)

        # this should be called async
        update_all_async(season=season, deep_crawl=deep)

        return Response({"Success": {"crawl finished"}}, status=status.HTTP_200_OK)
    else:
        return Response ({"description": "Start the crawler by calling this enpoint via post.", "allowed_parameters": {
            "deep": "boolean",
            "season": "number"
        }}, status.HTTP_200_OK)

I have read some tutorials, also how to use the loops and stuff, but I don't really get it... Where should I start the loop in this case?

[EDIT] 20/10/2017:

I solved it using threading for now, since it really is a "fire and forget" task. However, I still would like to know how to achieve the same thing using async / await.

Here's my current solution:

import threading


@api_view(['POST', 'GET'])
def start(request):
    ...
    t = threading.Thread(target=tasks.update_all, args=(deep, season))
    t.start()
    ...
like image 835
platzhersh Avatar asked Oct 18 '17 22:10

platzhersh


People also ask

Can I use async await Django?

Django has support for writing asynchronous (“async”) views, along with an entirely async-enabled request stack if you are running under ASGI. Async views will still work under WSGI, but with performance penalties, and without the ability to have efficient long-running requests.

How do I use async await in Python?

An async function uses the await keyword to denote a coroutine. When using the await keyword, coroutines release the flow of control back to the event loop. To run a coroutine, we need to schedule it on the event loop. After scheduling, coroutines are wrapped in Tasks as a Future object.

Is Django 4 asynchronous?

Latest version of the popular Python web framework also provides an asynchronous interface for all data access operations. Django 4.1, a new version of the major Python-based web framework, adds capabilities such as asynchronous handlers and an ORM interface but also makes some backward-incompatible changes.

Is FastAPI better than Django?

In conclusion, Django is perfect if you want to build robust full-stack web applications because it has several functionalities and works very well in production. On the other hand FastAPI is perfect if you're looking for high performance or scalable applications.


1 Answers

This is possible in Django 3.1+, after introducing asynchronous support.

Regarding the asynchronous running loop, you can make use of it by running Django with uvicorn or any other ASGI server instead of gunicorn or other WSGI servers. The difference is that when using an ASGI server, there's already a running loop, while you would need to create one when using WSGI. With ASGI, you can simply define async functions directly under views.py or its View Classes's inherited functions.

Assuming you go with ASGI, you have multiple ways of achieving this, I'll describe a couple (other options could make use of asyncio.Queue for example):

  1. Make start() async

By making start() async, you can make direct use of the existing running loop, and by using asyncio.Task, you can fire and forget into the existing running loop. And if you want to fire but remember, you can create another Task to follow up on this one, i.e.:

from rest_framework import status
from rest_framework.decorators import api_view
from rest_framework.response import Response
from django.conf import settings
from mycrawler import tasks

import asyncio

async def update_all_async(deep_crawl=True, season=settings.CURRENT_SEASON, log_to_db=True):
    await tasks.update_all(deep_crawl, season, log_to_db)

async def follow_up_task(task: asyncio.Task):
    await asyncio.sleep(5) # Or any other reasonable number, or a finite loop...
    if task.done():
        print('update_all task completed: {}'.format(task.result()))
    else:
        print('task not completed after 5 seconds, aborting')
        task.cancel()


@api_view(['POST', 'GET'])
async def start(request):
    """
    Start crawling.
    """
    if request.method == 'POST':
        print("Crawler: start {}".format(request))

        deep = request.data.get('deep', False)
        season = request.data.get('season', settings.CURRENT_SEASON)

        # Once the task is created, it will begin running in parallel
        loop = asyncio.get_running_loop()
        task = loop.create_task(update_all_async(season=season, deep_crawl=deep))

        # Fire up a task to track previous down
        loop.create_task(follow_up_task(task))

        return Response({"Success": {"crawl finished"}}, status=status.HTTP_200_OK)
    else:
        return Response ({"description": "Start the crawler by calling this enpoint via post.", "allowed_parameters": {
            "deep": "boolean",
            "season": "number"
        }}, status.HTTP_200_OK)
  1. async_to_sync

Sometimes you can't just have an async function to route the request to in the first place, as it happens with DRF (as of today). For this, Django provides some useful async adapter functions, but be aware that switching from sync to async context or vice versa, comes with a small performance penalty of approximately 1ms. Note that this time, the running loop as gathered in the update_all_sync function instead:

from rest_framework import status
from rest_framework.decorators import api_view
from rest_framework.response import Response
from django.conf import settings
from mycrawler import tasks

import asyncio
from asgiref.sync import async_to_sync

@async_to_sync
async def update_all_async(deep_crawl=True, season=settings.CURRENT_SEASON, log_to_db=True):
    #We can use the running loop here in this use case
    loop = asyncio.get_running_loop()
    task = loop.create_task(tasks.update_all(deep_crawl, season, log_to_db))
    loop.create_task(follow_up_task(task))

async def follow_up_task(task: asyncio.Task):
    await asyncio.sleep(5) # Or any other reasonable number, or a finite loop...
    if task.done():
        print('update_all task completed: {}'.format(task.result()))
    else:
        print('task not completed after 5 seconds, aborting')
        task.cancel()


@api_view(['POST', 'GET'])
def start(request):
    """
    Start crawling.
    """
    if request.method == 'POST':
        print("Crawler: start {}".format(request))

        deep = request.data.get('deep', False)
        season = request.data.get('season', settings.CURRENT_SEASON)

        # Make update all "sync"
        sync_update_all_sync = async_to_sync(update_all_async)
        sync_update_all_sync(season=season, deep_crawl=deep)

        return Response({"Success": {"crawl finished"}}, status=status.HTTP_200_OK)
    else:
        return Response ({"description": "Start the crawler by calling this enpoint via post.", "allowed_parameters": {
            "deep": "boolean",
            "season": "number"
        }}, status.HTTP_200_OK)

In both cases, the function will quickly return the 200, but technically the 2nd option is slower.

IMPORTANT: When using Django, it is common to have DB operations involved in these async operations. DB operations in Django can only be synchronous, at least for now, so you will have to consider this in asynchronous contexts. sync_to_async() becomes very handy for these cases.

like image 79
castel Avatar answered Sep 20 '22 05:09

castel