I'm facing a strange situation, where sequence of query logged in Django and Postgres is different when using select_for_update()
inside transaction.atomic()
block.
Basically I've a ModelForm
where I'm validating the cleaned_data
against the database, for duplicate request. And then in create view's form_valid()
method, I'm saving the instance. To have both the operation inside same transaction, I'm overriding post()
method, and wrapping those two method calls inside transaction.atomic()
.
Here's the code for whatever I said above:
# Form
class MenuForm(forms.ModelForm):
def __init__(self, *args, **kwargs):
user_id = kwargs.pop('user_id', None)
super(MenuForm, self).__init__(*args, **kwargs)
def clean(self):
cleaned_data = super(MenuForm, self).clean()
dish_name = cleaned_data.get('dish_name')
menus = Menu.objects.select_for_update().filter(user_id=self.user_id)
for menu in menus:
if menu.dish_name == dish_name:
self.add_error('dish_name', 'Dish already exists')
return cleaned_data
return cleaned_data
# CreateView
class MenuCreateView(CreateView):
form_class = MenuForm
def get_form_kwargs(self):
kwargs = super(MenuCreateView, self).get_form_kwargs()
kwargs.update({'user_id': self.request.session.get('user_id')})
return kwargs
def form_valid(self, form):
user = User.objects.get(id=self.request.session.get('user_id'))
form.instance.user = user
return super(MenuCreateView, self).form_valid(form)
def post(self, request, *args, **kwargs):
form = self.get_form()
with transaction.atomic():
if form.is_valid():
return self.form_valid(form)
else:
return self.form_invalid(form)
Now suppose I fire two request at the same time, to create a menu with same dish. I expect the second request to fail. But, both of them are passing. It looks like, the second transaction is not seeing the changes done in previous transaction. Because of which, the total menus
remain the same in both the transaction that is returned by select_for_update()
.
Given that Postgres default Isolation level is READ COMMITTED
, I expect the changes to be visible. So, I tried logging the queries to see that COMMIT; is fired at right time. Here's the query log by django and in postgres:
SELECT "menu"."id", "menu"."dish_id", "menu"."dish_name" FROM "menu" WHERE ("menu"."dish_name" = "Test Dish") FOR UPDATE; args=("Test Dish")
INSERT INTO "menu" ("dish_id", "dish_name") VALUES (2, "Test Dish") RETURNING "menu"."id"; args=(2, "Test Dish")
SELECT "menu"."id", "menu"."dish_id", "menu"."dish_name" FROM "menu" WHERE ("menu"."dish_name" = "Test Dish") FOR UPDATE; args=("Test Dish")
INSERT INTO "menu" ("dish_id", "dish_name") VALUES (2, "Test Dish") RETURNING "menu"."id"; args=(2, "Test Dish")
<2016-03-18 17:55:46.176 IST 0 2/31 56ebf3ca.aac0>LOG: statement: SHOW default_transaction_isolation
<2016-03-18 17:55:46.177 IST 0 2/32 56ebf3ca.aac0>LOG: statement: SET TIME ZONE 'UTC'
<2016-03-18 17:55:46.178 IST 0 2/33 56ebf3ca.aac0>LOG: statement: SELECT t.oid, typarray
FROM pg_type t JOIN pg_namespace ns
ON typnamespace = ns.oid
WHERE typname = 'hstore';
<2016-03-18 17:55:46.182 IST 0 2/34 56ebf3ca.aac0>LOG: statement: BEGIN
<2016-03-18 17:55:46.301 IST 0 3/2 56ebf3ca.aac1>LOG: statement: SHOW default_transaction_isolation
<2016-03-18 17:55:46.302 IST 0 3/3 56ebf3ca.aac1>LOG: statement: SET TIME ZONE 'UTC'
<2016-03-18 17:55:46.302 IST 0 3/4 56ebf3ca.aac1>LOG: statement: SELECT t.oid, typarray
FROM pg_type t JOIN pg_namespace ns
ON typnamespace = ns.oid
WHERE typname = 'hstore';
<2016-03-18 17:55:46.312 IST 0 3/5 56ebf3ca.aac1>LOG: statement: BEGIN
<2016-03-18 17:55:46.963 IST 0 3/5 56ebf3ca.aac1>LOG: statement: SELECT "menu"."id", "menu"."dish_id", "menu"."dish_name" FROM "menu"
WHERE ("menu"."dish_name" = "Test Dish") FOR UPDATE
<2016-03-18 17:55:46.964 IST 0 2/34 56ebf3ca.aac0>LOG: statement: SELECT "menu"."id", "menu"."dish_id", "menu"."dish_name" FROM "menu"
WHERE ("menu"."dish_name" = "Test Dish") FOR UPDATE
<2016-03-18 17:55:47.040 IST 23712 3/5 56ebf3ca.aac1>LOG: statement: INSERT INTO "menu" ("dish_id", "dish_name") VALUES (2, "Test Dish")RETURNING "menu"."id"
<2016-03-18 17:55:47.061 IST 23712 3/5 56ebf3ca.aac1>LOG: statement: COMMIT
<2016-03-18 17:55:47.229 IST 23713 2/34 56ebf3ca.aac0>LOG: statement: INSERT INTO "menu" ("dish_id", "dish_name") VALUES (2, "Test Dish")RETURNING "menu"."id"
<2016-03-18 17:55:47.231 IST 23713 2/34 56ebf3ca.aac0>LOG: statement: COMMIT
max_connections = 100
log_destination = 'stderr'
logging_collector = on
log_directory = 'pg_log'
log_line_prefix = '<%m %x %v %c>'
log_statement = 'all'
As you can see, the order of SELECT and INSERT queries in not the same in both the logs. I'm unable to understand why this would happen. Also, if you notice, the session_id for the SELECT queries in Postgres log is different. Could that explain something here?
And if this is expected behaviour, how could I solve the core issue here? Avoid concurrent INSERT queries, based on existing record.
I didn't mention that the actual logic to ignore duplicate menu is not just based on dish name. The one above is simplified example.
Considering menu model as:
class Menu:
user_id = models.IntegerField()
dish = models.ForeignKey(Dish)
order_start_time = models.DateTimeField()
order_end_time = models.DateTimeField()
The actual logic goes like this:
dish_name
from db.order_start_time
and order_end_time
for all the those menu, and see if any of them overlaps with the order_start_time
and order_end_time
for the new menu. If conflict found, avoid addition.So, we can add two menu for dish - d1
, having order window - [9am-10am]
and [2pm-3pm]
.
Postgresql is the best database for django, as said in django's official documentation.
Django officially supports the following databases: PostgreSQL. MariaDB. MySQL.
The select_for_update method offered by the Django ORM solves the problem of concurrency by returning a queryset that locks all the rows that belong to this queryset until the outermost transaction it is inside gets committed thus preventing data corruption.
This is because a Django QuerySet is a lazy object. It contains all of the information it needs to populate itself from the database, but will not actually do so until the information is needed.
EDITED:
It is possible to add special method for a model validate_unique
:
from django.db import models
from django.core.validators import ValidationError
from django.forms.forms import NON_FIELD_ERRORS
class Dish(models.Model):
name = models.CharField('Dish name', max_length=200)
class Menu(models.Model):
user_id = models.IntegerField()
dish = models.ForeignKey(Dish)
order_start_time = models.DateTimeField()
order_end_time = models.DateTimeField()
def validate_unique(self, *args, **kwargs):
# call inherited unique validators
super().validate_unique(*args, **kwargs) # or super(Menu, self) for Python2.7
# query if DB already has object with same dish
# and overlapping reservation
# [order_start_time, order_end_time]
qs = self.__class__._default_manager.filter(
order_start_time__lte=self.order_end_time,
order_end_time__gte=self.order_start_time,
dish=self.dish,
)
# and this object is not the same we are working with
if not self._state.adding and self.pk is not None:
qs = qs.exclude(pk=self.pk)
if qs.exists():
raise ValidationError({
NON_FIELD_ERRORS: ['Overlapping order dates for dish'],
})
Lets try it in console:
from core.models import *
m=Menu(user_id=1, dish_id=1, order_start_time='2016-03-22 10:00', order_end_time='2016-03-22 15:00')
m.validate_unique()
# no output here - all is ok
m.save()
print(m.id)
8
# lets add duplicate
m=Menu(user_id=1, dish_id=1, order_start_time='2016-03-22 12:00', order_end_time='2016-03-22 13:00')
m.validate_unique()
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/Users/el/tmp/hypothesis_test/menu/core/models.py", line 29, in validate_unique
NON_FIELD_ERRORS: ['Overlapping order dates for dish'],
django.core.exceptions.ValidationError: {'__all__': ['Overlapping order dates for dish']}
# excellent! dup is found!
# But! Django helps you find dups but allows you to add them to db if you want it!
# It's responsibility of your application not to add duplicates.
m.save()
print(m.id)
9
In this case you need to make a CONSTRAINT on database-level.
In PostgreSQL console:
CREATE EXTENSION btree_gist;
-- our table:
SELECT * FROM core_menu;
id | user_id | order_start_time | order_end_time | dish_id
----+---------+------------------------+------------------------+---------
8 | 1 | 2016-03-22 13:00:00+03 | 2016-03-22 18:00:00+03 | 1
9 | 1 | 2016-03-22 15:00:00+03 | 2016-03-22 16:00:00+03 | 1
DELETE FROM core_menu WHERE id=9; -- we should remove dups before adding unique constraint
ALTER TABLE core_menu
ADD CONSTRAINT core_menu_exclude_dish_same_tstzrange_constr
EXCLUDE USING gist (dish_id WITH =, tstzrange(order_start_time, order_end_time) WITH &&);
Now lets create the duplicate object and add it to db:
m=Menu(user_id=1, dish_id=1, order_start_time='2016-03-22 13:00', order_end_time='2016-03-22 14:00')
m.save()
Traceback (most recent call last):
File "/Users/el/tmp/hypothesis_test/venv/lib/python3.5/site-packages/django/db/backends/utils.py", line 64, in execute
return self.cursor.execute(sql, params)
psycopg2.IntegrityError: ОШИБКА: конфликтующее значение ключа нарушает ограничение-исключение "core_menu_exclude_dish_same_tstzrange_constr"
DETAIL: Key (dish_id, tstzrange(order_start_time, order_end_time))=(1, ["2016-03-22 13:00:00+00","2016-03-22 14:00:00+00")) conflicts with existing key (dish_id, tstzrange(order_start_time, order_end_time))=(1, ["2016-03-22 10:00:00+00","2016-03-22 15:00:00+00")).
Excellent! Now data is validated at program and db levels.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With