Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python asyncio skip processing untill function return

I'm still very confused about how asyncio works, so I was trying to set a simple example but couldn't achieve it.

The following example is a web server (Quart) that receives a request to generate a large PDF, the server then returns a response before start processing the PDF, then starts processing it and will send the download link to an email later.

from quart import Quart
import asyncio
import time

app = Quart(__name__)

@app.route('/')
async def pdf():
    t1 = time.time()
    await generatePdf()
    return 'Time to execute : {} seconds'.format(time.time() - t1)

async def generatePdf():
    await asyncio.sleep(5)
    #sync generatepdf
    #send pdf link to email

app.run()

How would I go about this? in the above example I don't want the 5 seconds to be waited before the return.

I'm not even sure if asyncio is what I need.

And I'm afraid that blocking the server app after the response has returned is not a thing that should be done, but not sure either.

Also the pdf library is synchronous, but I guess that's a problem for another day...

like image 914
Mojimi Avatar asked Jan 25 '19 17:01

Mojimi


2 Answers

The comment has everything you need to respond to the web request and schedule the pdf generation for later.

asyncio.create_task(generatePdf())

However it is not a good idea if the pdf processing is slow as it will block the asyncio event thread. i.e. The current request will be responded quickly but the following request will have to wait till the pdf generation is complete.

The correct way would be run the task in an executor (especially ProcessPoolExecutor).

from quart import Quart
import asyncio
import time
from concurrent.futures import ProcessPoolExecutor

app = Quart(__name__)
executor = ProcessPoolExecutor(max_workers=5)

@app.route('/')
async def pdf():
    t1 = time.time()
    asyncio.get_running_loop().run_in_executor(executor, generatePdf)
    # await generatePdf()
    return 'Time to execute : {} seconds'.format(time.time() - t1)

def generatePdf():
    #sync generatepdf
    #send pdf link to email

app.run()

It is important to note that since, it is running in different process, the generatePdf cannot access any data without synchronization. So pass everything the function needs when calling the function.


Update

If you can refactor the generatePdf function and make it async, it works best.

Example if the generate pdf looks like

def generatePdf():
    image1 = downloadImage(image1Url)
    image2 = downloadImage(image2Url)
    data = queryData()
    pdfFile = makePdf(image1, image2, data)
    link = upLoadToS3(pdfFile)
    sendEmail(link)

You can make the function async like:

async def generatePdf():
    image1, image2, data = await asyncio.gather(downloadImage(image1Url), downloadImage(image2Url), queryData())
    pdfFile = makePdf(image1, image2, data)
    link = await upLoadToS3(pdfFile)
    await sendEmail(link) 

Note: All the helper functions like downloadImage, queryData need to be rewritten to support async. This way, requests won't be blocked even if the database or image servers are slow. Everything runs in the same asyncio thread.

If some of them are not yet async, those can be used with run_in_executor and should work good with other async functions.

like image 148
balki Avatar answered Oct 19 '22 23:10

balki


  1. I highly recommend on reviewing this explanatory article by Brad Solomon on parallel programming and asyncio in python.
  2. For the purpose of asynchronously performing a task, without the need to block the request until the task is complete - I think the best option is to use a queue that with a "PDFGenerator" class that consumes from the queue pattern(also covered in the article)
like image 1
0e1val Avatar answered Oct 20 '22 01:10

0e1val