Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Django: object creation in atomic transaction

I have a simple Task model:

class Task(models.Model):

    name = models.CharField(max_length=255)
    order = models.IntegerField(db_index=True)

And a simple task_create view:

def task_create(request):

    name = request.POST.get('name')
    order = request.POST.get('order')

    Task.objects.filter(order__gte=order).update(order=F('order') + 1)
    new_task = Task.objects.create(name=name, order=order)

    return HttpResponse(new_task.id)

View shifts existing tasks that goes after newly created by + 1, then creates a new one.

And there are lots of users of this method, and I suppose something will go wrong one day with ordering because update and create definitely should be performed together.

So, I just want to be shure, will it be enough to avoid any data corruptions:

from django.db import transaction

def task_create(request):

    name = request.POST.get('name')
    order = request.POST.get('order')

    with transaction.atomic():
        Task.objects.select_for_update().filter(order__gte=order).update(order=F('order') + 1)
        new_task = Task.objects.create(name=name, order=order)

    return HttpResponse(new_task.id)

1) Probably, something more should be done in task creation line like select_for_update before filter of existing Task.objects?

2) Does it matter where return HttpResponse() is located? Inside transaction block or outside?

Big thx

like image 364
MaxCore Avatar asked Apr 08 '18 01:04

MaxCore


People also ask

How does Django atomic transaction work?

Django provides a single API to control database transactions. Atomicity is the defining property of database transactions. atomic allows us to create a block of code within which the atomicity on the database is guaranteed. If the block of code is successfully completed, the changes are committed to the database.

Is Django bulk update atomic?

This operation will not be atomic because while this loop is running, there could be other processes trying to update the same rows. This operation will not be efficient either because it will make an update query to the database for every single iteration of this loop.

How does Django handle concurrency?

When you run multiple workers of your Django application, you will run into concurrency issues when the same queryset is updated by different processes at the same time. To prevent this, use select_for_update inside a transaction block to fetch your queryset so that it is locked until the transaction is completed.

Which creates a new savepoint and returns the Savepoint ID SID )?

savepoint (using=None)[source] Creates a new savepoint. This marks a point in the transaction that is known to be in a “good” state. Returns the savepoint ID ( sid ).


2 Answers

1) Probably, something more should be done in task creation line like select_for_update before filter of existing Task.objects?

No - what you have currently looks fine and should work the way you want it to.

2) Does it matter where return HttpResponse() is located? Inside transaction block or outside?

Yes, it does matter. You need to return a response to the client regardless of whether your transaction was successful or not - so it definitely needs to be outside of the transaction block. If you did it inside the transaction, the client would get a 500 Server Error if the transaction failed.

However if the transaction fails, then you will not have a new task ID and cannot return that in your response. So you probably need to return different responses depending on whether the transaction succeeds, e.g,:

from django.db import IntegrityError, transaction

try:
    with transaction.atomic():
        Task.objects.select_for_update().filter(order__gte=order).update(
                                                           order=F('order') + 1)
        new_task = Task.objects.create(name=name, order=order)
except IntegrityError:
    # Transaction failed - return a response notifying the client
    return HttpResponse('Failed to create task, please try again!')

# If it succeeded, then return a normal response
return HttpResponse(new_task.id)
like image 90
solarissmoke Avatar answered Sep 19 '22 05:09

solarissmoke


You could also try to change your model so you don't need to update so many other rows when inserting a new one.

For example, you could try something resembling a double-linked list.

(I used long explicit names for fields and variables here).

# models.py
class Task(models.Model):
    name = models.CharField(max_length=255)
    task_before_this_one = models.ForeignKey(
        Task,
        null=True,
        blank=True,
        related_name='task_before_this_one_set')
    task_after_this_one = models.ForeignKey(
        Task,
        null=True,
        blank=True,
        related_name='tasks_after_this_one_set')

Your task at the top of the queue would be the one that has the field task_before_this_one set to null. So to get the first task of the queue:

# these will throw exceptions if there are many instances
first_task = Task.objects.get(task_before_this_one=None)
last_task = Task.objects.get(task_after_this_one=None)

When inserting a new instance, you just need to know after which task it should be placed (or, alternatively, before which task). This code should do that:

def task_create(request):
    new_task = Task.objects.create(
        name=request.POST.get('name'))

    task_before = get_object_or_404(
        pk=request.POST.get('task_before_the_new_one'))
    task_after = task_before.task_after_this_one

    # modify the 2 other tasks
    task_before.task_after_this_one = new_task
    task_before.save()
    if task_after is not None:
        # 'task_after' will be None if 'task_before' is the last one in the queue
        task_after.task_before_this_one = new_task
        task_after.save()

    # update newly create task
    new_task.task_before_this_one = task_before
    new_task.task_after_this_one = task_after  # this could be None
    new_task.save()

    return HttpResponse(new_task.pk)

This method only updates 2 other rows when inserting a new row. You might still want to wrap the whole method in a transaction if there is really high concurrency in your app, but this transaction will only lock up to 3 rows, not all the others as well.

This approach might be of use to you if you have a very long list of tasks.


EDIT: how to get an ordered list of tasks

This can not be done at the database level in a single query (as far as I know), but you could try this function:

def get_ordered_task_list():
    # get the first task
    aux_task = Task.objects.get(task_before_this_one=None)

    task_list = []
    while aux_task is not None:
        task_list.append(aux_task)
        aux_task = aux_task.task_after_this_one

    return task_list

As long as you only have a few hundered tasks, this operation should not take that much time so that it impacts the response time. But you will have to try that out for yourself, in your environment, your database, your hardware.

like image 33
Ralf Avatar answered Sep 22 '22 05:09

Ralf