I'm still very confused about how asyncio works, so I was trying to set a simple example but couldn't achieve it.
The following example is a web server (Quart) that receives a request to generate a large PDF, the server then returns a response before start processing the PDF, then starts processing it and will send the download link to an email later.
from quart import Quart
import asyncio
import time
app = Quart(__name__)
@app.route('/')
async def pdf():
t1 = time.time()
await generatePdf()
return 'Time to execute : {} seconds'.format(time.time() - t1)
async def generatePdf():
await asyncio.sleep(5)
#sync generatepdf
#send pdf link to email
app.run()
How would I go about this? in the above example I don't want the 5 seconds to be waited before the return.
I'm not even sure if asyncio is what I need.
And I'm afraid that blocking the server app after the response has returned is not a thing that should be done, but not sure either.
Also the pdf library is synchronous, but I guess that's a problem for another day...
The comment has everything you need to respond to the web request and schedule the pdf generation for later.
asyncio.create_task(generatePdf())
However it is not a good idea if the pdf processing is slow as it will block the asyncio event thread. i.e. The current request will be responded quickly but the following request will have to wait till the pdf generation is complete.
The correct way would be run the task in an executor (especially ProcessPoolExecutor).
from quart import Quart
import asyncio
import time
from concurrent.futures import ProcessPoolExecutor
app = Quart(__name__)
executor = ProcessPoolExecutor(max_workers=5)
@app.route('/')
async def pdf():
t1 = time.time()
asyncio.get_running_loop().run_in_executor(executor, generatePdf)
# await generatePdf()
return 'Time to execute : {} seconds'.format(time.time() - t1)
def generatePdf():
#sync generatepdf
#send pdf link to email
app.run()
It is important to note that since, it is running in different process, the generatePdf
cannot access any data without synchronization. So pass everything the function needs when calling the function.
Update
If you can refactor the generatePdf
function and make it async, it works best.
Example if the generate pdf looks like
def generatePdf():
image1 = downloadImage(image1Url)
image2 = downloadImage(image2Url)
data = queryData()
pdfFile = makePdf(image1, image2, data)
link = upLoadToS3(pdfFile)
sendEmail(link)
You can make the function async like:
async def generatePdf():
image1, image2, data = await asyncio.gather(downloadImage(image1Url), downloadImage(image2Url), queryData())
pdfFile = makePdf(image1, image2, data)
link = await upLoadToS3(pdfFile)
await sendEmail(link)
Note: All the helper functions like downloadImage
, queryData
need to be rewritten to support async
. This way, requests won't be blocked even if the database or image servers are slow. Everything runs in the same asyncio thread.
If some of them are not yet async, those can be used with run_in_executor
and should work good with other async functions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With