I have a simple Task model:
class Task(models.Model):
name = models.CharField(max_length=255)
order = models.IntegerField(db_index=True)
And a simple task_create view:
def task_create(request):
name = request.POST.get('name')
order = request.POST.get('order')
Task.objects.filter(order__gte=order).update(order=F('order') + 1)
new_task = Task.objects.create(name=name, order=order)
return HttpResponse(new_task.id)
View shifts existing tasks that goes after newly created by + 1, then creates a new one.
And there are lots of users of this method, and I suppose something will go wrong one day with ordering because update and create definitely should be performed together.
So, I just want to be shure, will it be enough to avoid any data corruptions:
from django.db import transaction
def task_create(request):
name = request.POST.get('name')
order = request.POST.get('order')
with transaction.atomic():
Task.objects.select_for_update().filter(order__gte=order).update(order=F('order') + 1)
new_task = Task.objects.create(name=name, order=order)
return HttpResponse(new_task.id)
1) Probably, something more should be done in task creation line like select_for_update
before filter
of existing Task.objects
?
2) Does it matter where return HttpResponse()
is located? Inside transaction block or outside?
Big thx
Django provides a single API to control database transactions. Atomicity is the defining property of database transactions. atomic allows us to create a block of code within which the atomicity on the database is guaranteed. If the block of code is successfully completed, the changes are committed to the database.
This operation will not be atomic because while this loop is running, there could be other processes trying to update the same rows. This operation will not be efficient either because it will make an update query to the database for every single iteration of this loop.
When you run multiple workers of your Django application, you will run into concurrency issues when the same queryset is updated by different processes at the same time. To prevent this, use select_for_update inside a transaction block to fetch your queryset so that it is locked until the transaction is completed.
savepoint (using=None)[source] Creates a new savepoint. This marks a point in the transaction that is known to be in a “good” state. Returns the savepoint ID ( sid ).
1) Probably, something more should be done in task creation line like
select_for_update
before filter of existingTask.objects
?
No - what you have currently looks fine and should work the way you want it to.
2) Does it matter where
return HttpResponse()
is located? Inside transaction block or outside?
Yes, it does matter. You need to return a response to the client regardless of whether your transaction was successful or not - so it definitely needs to be outside of the transaction block. If you did it inside the transaction, the client would get a 500 Server Error if the transaction failed.
However if the transaction fails, then you will not have a new task ID and cannot return that in your response. So you probably need to return different responses depending on whether the transaction succeeds, e.g,:
from django.db import IntegrityError, transaction
try:
with transaction.atomic():
Task.objects.select_for_update().filter(order__gte=order).update(
order=F('order') + 1)
new_task = Task.objects.create(name=name, order=order)
except IntegrityError:
# Transaction failed - return a response notifying the client
return HttpResponse('Failed to create task, please try again!')
# If it succeeded, then return a normal response
return HttpResponse(new_task.id)
You could also try to change your model so you don't need to update so many other rows when inserting a new one.
For example, you could try something resembling a double-linked list.
(I used long explicit names for fields and variables here).
# models.py
class Task(models.Model):
name = models.CharField(max_length=255)
task_before_this_one = models.ForeignKey(
Task,
null=True,
blank=True,
related_name='task_before_this_one_set')
task_after_this_one = models.ForeignKey(
Task,
null=True,
blank=True,
related_name='tasks_after_this_one_set')
Your task at the top of the queue would be the one that has the field task_before_this_one
set to null. So to get the first task of the queue:
# these will throw exceptions if there are many instances
first_task = Task.objects.get(task_before_this_one=None)
last_task = Task.objects.get(task_after_this_one=None)
When inserting a new instance, you just need to know after which task it should be placed (or, alternatively, before which task). This code should do that:
def task_create(request):
new_task = Task.objects.create(
name=request.POST.get('name'))
task_before = get_object_or_404(
pk=request.POST.get('task_before_the_new_one'))
task_after = task_before.task_after_this_one
# modify the 2 other tasks
task_before.task_after_this_one = new_task
task_before.save()
if task_after is not None:
# 'task_after' will be None if 'task_before' is the last one in the queue
task_after.task_before_this_one = new_task
task_after.save()
# update newly create task
new_task.task_before_this_one = task_before
new_task.task_after_this_one = task_after # this could be None
new_task.save()
return HttpResponse(new_task.pk)
This method only updates 2 other rows when inserting a new row. You might still want to wrap the whole method in a transaction if there is really high concurrency in your app, but this transaction will only lock up to 3 rows, not all the others as well.
This approach might be of use to you if you have a very long list of tasks.
EDIT: how to get an ordered list of tasks
This can not be done at the database level in a single query (as far as I know), but you could try this function:
def get_ordered_task_list():
# get the first task
aux_task = Task.objects.get(task_before_this_one=None)
task_list = []
while aux_task is not None:
task_list.append(aux_task)
aux_task = aux_task.task_after_this_one
return task_list
As long as you only have a few hundered tasks, this operation should not take that much time so that it impacts the response time. But you will have to try that out for yourself, in your environment, your database, your hardware.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With