Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Django and a Database with write-Instance + Multiple Read Replicas -- running Celery jobs

I have a django app running in production. Its database has main write instance and a few read replicas. I use DATABASE_ROUTERS to route between the write instance and the read replicas based on whether I need to read or write.

I encountered a situation where I have to do some async processing on an object due to a user request. The order of actions is:

  1. User submits a request via HTTPS/REST.
  2. The view creates an Object and saves it to the DB.
  3. Trigger a celery job to process the object outside of the request-response cycle and passing the object ID to it.
  4. Sending an OK response to the request.

Now, the celery job may kick in in 10 ms or 10 minutes depending on the queue. When it finally tuns, the celery job first tries to load the object based on the ID provided. Initially I had issues doing a my_obj = MyModel.objects.get(pk=given_id) because the read replica would be used at this point, if the queue is empty and the celery job runs immediately after being triggered, the object may have not propagated to the read-replicas yet.

I resolved that issue by replacing my_obj = MyModel.objects.get(pk=given_id) with my_obj = MyModel.objects.using('default').get(pk=given_id) -- this ensures the object is read from my write-db-instance and is always available.

however, now I have another issue I did not anticipate.

calling my_obj.certain_many_to_many_objects.all() triggers another call to the database as the ORM is lazy. That call IS being done on the read-replica. I was hoping it would stick to the database I defined with using but that's not the case. Is there a way to force all sub-element objects to use the same write-db-instance?

like image 351
JasonGenX Avatar asked Sep 02 '21 17:09

JasonGenX


2 Answers

I suspect your custom database router needs a tweak. The default behaviour without a custom router should provide the database stickiness you require

The default routing scheme ensures that objects remain ‘sticky’ to their original database (i.e., an object retrieved from the foo database will be saved on the same database). [...] You don’t have to do anything to activate the default routing scheme – it is provided ‘out of the box’ on every Django project.

From Automatic DB Routing

So your DB router just needs to offer this behaviour upfront, as probably being the Right Thing To Do in 99.9% of cases.

def db_for_read(model, **hints):
    instance = hints.get('instance')
    if instance is not None and instance._state.db:
        return instance._state.db
    # else return your read replica
    return 'read-only'  # or whatever it's called

See django/db/utils.py

like image 104
Steven Avatar answered Oct 19 '22 18:10

Steven


Doesn't using my_obj.certain_many_to_many_objects.all().using('default') work?

the .all() returns a queryset, so you should be able to add the .using(..) part for it with it working.

like image 1
DanielM Avatar answered Oct 19 '22 18:10

DanielM