Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

FastAPI @repeat_every how to prevent parallel def scheduled_task() instances

Tags:

fastapi

We are using fastapi-utils to have a scheduled task in the background. We check all 5 seconds if new data is available in the DB, if yes we process it (takes up to 5 minutes)

During this time, the couroutine should be blocking so that its only triggered once.

We noticed that our data is sometimes processed 3x, we assume that the scheduler continues to run, even though the function has been triggered.

Therefore we tried to circumvent it with the IsRunningQuery variable.

We tried a solution with a while True loop without @repeat_every to make it run once at startup, but Azure Webapps does not allow running this.

@app.on_event("startup") 
@repeat_every(wait_first=True,seconds=int(10))
def scheduled_task() -> None:
    global IsRunningQuery
    global LastCheck
    if IsRunningQuery == False:
        IsRunningQuery = True
        gunicorn_logger.info("status='checkforleads'")
        OurProccessingClass.processDataBaseData() # can take up 5 minutes
        LastCheck=Utils.datetime()
        IsRunningQuery = False

This variante works in our DEV environment, but not on Azure

@app.on_event("startup") 
async def scheduled_task() -> None:
    while True:
        gunicorn_logger.info("status='checkforleads'")
        OurProccessingClass.processDataBaseData() # can take up 5 minutes
        time.sleep(int(os.environ["CRM_SLEEP"]))
like image 454
user670186 Avatar asked Nov 16 '22 08:11

user670186


1 Answers

To accomplish this task, you need some locking system, but one that is suitable for your environment.

For example, when running only a single worker, with a single async loop, a simple Lock from the asyncio synchronization primitives would be ideal...

But if you want to introduce more workers, then the state of the lock won't be synchronized between the instances. If your workers are being spawned on the same system, you can use a file system lock (for example the one from the fnctl module), but again, it won't work anymore if you'll introduce more server instances.

The next step may be to introduce a lock on the database level, or any other external system that is capable of managing a lock or delivering some task to only one recipient, but this will get very complicated very quickly.

That's why, there are systems like celery that will allow you to schedule tasks and the system will take care to prevent, if possible, this task being executed multiple times (note this is not always possible, as the executor may for example finish the task but never update the state of the task because of some fatal error or any other interruption, like power loss. That's why those kind of systems can ensure either that task will be run at least once or that it will be run at most once, but will never guarantee both, will only do it's best to maximize the chances of the other one).

like image 133
GwynBleidD Avatar answered Dec 09 '22 21:12

GwynBleidD