Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference in sequence of query generated in Django and Postgres for select_for_update

I'm facing a strange situation, where sequence of query logged in Django and Postgres is different when using select_for_update() inside transaction.atomic() block.

Basically I've a ModelForm where I'm validating the cleaned_data against the database, for duplicate request. And then in create view's form_valid() method, I'm saving the instance. To have both the operation inside same transaction, I'm overriding post() method, and wrapping those two method calls inside transaction.atomic().

Here's the code for whatever I said above:

# Form
class MenuForm(forms.ModelForm):
    def __init__(self, *args, **kwargs):
        user_id = kwargs.pop('user_id', None)
        super(MenuForm, self).__init__(*args, **kwargs)

    def clean(self):
        cleaned_data = super(MenuForm, self).clean()
        dish_name = cleaned_data.get('dish_name')
        menus = Menu.objects.select_for_update().filter(user_id=self.user_id)

        for menu in menus:
            if menu.dish_name == dish_name:
                self.add_error('dish_name', 'Dish already exists')
                return cleaned_data
        return cleaned_data

# CreateView
class MenuCreateView(CreateView):
    form_class = MenuForm

    def get_form_kwargs(self):
        kwargs = super(MenuCreateView, self).get_form_kwargs()
        kwargs.update({'user_id': self.request.session.get('user_id')})
        return kwargs

    def form_valid(self, form):
        user = User.objects.get(id=self.request.session.get('user_id'))
        form.instance.user = user
        return super(MenuCreateView, self).form_valid(form)

    def post(self, request, *args, **kwargs):
        form = self.get_form()

        with transaction.atomic():
            if form.is_valid():
                return self.form_valid(form)
            else:
                return self.form_invalid(form)

Now suppose I fire two request at the same time, to create a menu with same dish. I expect the second request to fail. But, both of them are passing. It looks like, the second transaction is not seeing the changes done in previous transaction. Because of which, the total menus remain the same in both the transaction that is returned by select_for_update().

Given that Postgres default Isolation level is READ COMMITTED, I expect the changes to be visible. So, I tried logging the queries to see that COMMIT; is fired at right time. Here's the query log by django and in postgres:

Django Log:

SELECT "menu"."id", "menu"."dish_id", "menu"."dish_name" FROM "menu" WHERE ("menu"."dish_name" = "Test Dish") FOR UPDATE; args=("Test Dish")    
INSERT INTO "menu" ("dish_id", "dish_name") VALUES (2, "Test Dish") RETURNING "menu"."id"; args=(2, "Test Dish")
SELECT "menu"."id", "menu"."dish_id", "menu"."dish_name" FROM "menu" WHERE ("menu"."dish_name" = "Test Dish") FOR UPDATE; args=("Test Dish")
INSERT INTO "menu" ("dish_id", "dish_name") VALUES (2, "Test Dish") RETURNING "menu"."id"; args=(2, "Test Dish")

Postgres Log:

<2016-03-18 17:55:46.176 IST  0  2/31  56ebf3ca.aac0>LOG:  statement: SHOW default_transaction_isolation
<2016-03-18 17:55:46.177 IST  0  2/32  56ebf3ca.aac0>LOG:  statement: SET TIME ZONE 'UTC'
<2016-03-18 17:55:46.178 IST  0  2/33  56ebf3ca.aac0>LOG:  statement: SELECT t.oid, typarray
    FROM pg_type t JOIN pg_namespace ns
        ON typnamespace = ns.oid
    WHERE typname = 'hstore';

<2016-03-18 17:55:46.182 IST  0  2/34  56ebf3ca.aac0>LOG:  statement: BEGIN
<2016-03-18 17:55:46.301 IST  0  3/2  56ebf3ca.aac1>LOG:  statement: SHOW default_transaction_isolation
<2016-03-18 17:55:46.302 IST  0  3/3  56ebf3ca.aac1>LOG:  statement: SET TIME ZONE 'UTC'
<2016-03-18 17:55:46.302 IST  0  3/4  56ebf3ca.aac1>LOG:  statement: SELECT t.oid, typarray
    FROM pg_type t JOIN pg_namespace ns
        ON typnamespace = ns.oid
    WHERE typname = 'hstore';

<2016-03-18 17:55:46.312 IST  0  3/5  56ebf3ca.aac1>LOG:  statement: BEGIN
<2016-03-18 17:55:46.963 IST  0  3/5  56ebf3ca.aac1>LOG:  statement: SELECT "menu"."id", "menu"."dish_id", "menu"."dish_name" FROM "menu" 
WHERE ("menu"."dish_name" = "Test Dish") FOR UPDATE
<2016-03-18 17:55:46.964 IST  0  2/34  56ebf3ca.aac0>LOG:  statement: SELECT "menu"."id", "menu"."dish_id", "menu"."dish_name" FROM "menu" 
WHERE ("menu"."dish_name" = "Test Dish") FOR UPDATE
<2016-03-18 17:55:47.040 IST  23712  3/5  56ebf3ca.aac1>LOG:  statement: INSERT INTO "menu" ("dish_id", "dish_name") VALUES (2, "Test Dish")RETURNING "menu"."id"
<2016-03-18 17:55:47.061 IST  23712  3/5  56ebf3ca.aac1>LOG:  statement: COMMIT
<2016-03-18 17:55:47.229 IST  23713  2/34  56ebf3ca.aac0>LOG:  statement: INSERT INTO "menu" ("dish_id", "dish_name") VALUES (2, "Test Dish")RETURNING "menu"."id"
<2016-03-18 17:55:47.231 IST  23713  2/34  56ebf3ca.aac0>LOG:  statement: COMMIT

Postgres.conf:

max_connections = 100
log_destination = 'stderr'
logging_collector = on
log_directory = 'pg_log'
log_line_prefix = '<%m  %x  %v  %c>'
log_statement = 'all'

As you can see, the order of SELECT and INSERT queries in not the same in both the logs. I'm unable to understand why this would happen. Also, if you notice, the session_id for the SELECT queries in Postgres log is different. Could that explain something here?

And if this is expected behaviour, how could I solve the core issue here? Avoid concurrent INSERT queries, based on existing record.

UPDATE:

I didn't mention that the actual logic to ignore duplicate menu is not just based on dish name. The one above is simplified example.

Considering menu model as:

class Menu:
    user_id = models.IntegerField()
    dish = models.ForeignKey(Dish)
    order_start_time = models.DateTimeField()
    order_end_time = models.DateTimeField()

The actual logic goes like this:

  • Fetch all menu with the dish_name from db.
  • Check the order_start_time and order_end_time for all the those menu, and see if any of them overlaps with the order_start_time and order_end_time for the new menu. If conflict found, avoid addition.

So, we can add two menu for dish - d1, having order window - [9am-10am] and [2pm-3pm].

like image 907
Rohit Jain Avatar asked Mar 18 '16 14:03

Rohit Jain


People also ask

Is PostgreSQL good for Django?

Postgresql is the best database for django, as said in django's official documentation.

Which database is best for Django?

Django officially supports the following databases: PostgreSQL. MariaDB. MySQL.

What is Select_for_update in Django?

The select_for_update method offered by the Django ORM solves the problem of concurrency by returning a queryset that locks all the rows that belong to this queryset until the outermost transaction it is inside gets committed thus preventing data corruption.

Why are QuerySets considered lazy?

This is because a Django QuerySet is a lazy object. It contains all of the information it needs to populate itself from the database, but will not actually do so until the information is needed.


1 Answers

EDITED:

How to make Django validate overlapping reservations?

It is possible to add special method for a model validate_unique:

    from django.db import models
    from django.core.validators import ValidationError
    from django.forms.forms import NON_FIELD_ERRORS

    class Dish(models.Model):
        name = models.CharField('Dish name', max_length=200)

    class Menu(models.Model):
        user_id = models.IntegerField()
        dish = models.ForeignKey(Dish)
        order_start_time = models.DateTimeField()
        order_end_time = models.DateTimeField()

        def validate_unique(self, *args, **kwargs):
            # call inherited unique validators
            super().validate_unique(*args, **kwargs)  # or super(Menu, self) for Python2.7
            # query if DB already has object with same dish 
            # and overlapping reservation 
            # [order_start_time, order_end_time]
            qs = self.__class__._default_manager.filter(
                    order_start_time__lte=self.order_end_time,
                    order_end_time__gte=self.order_start_time,
                    dish=self.dish,
                )
            # and this object is not the same we are working with
            if not self._state.adding and self.pk is not None:
                qs = qs.exclude(pk=self.pk)
            if qs.exists():
                raise ValidationError({
                    NON_FIELD_ERRORS: ['Overlapping order dates for dish'],
                    })

Lets try it in console:

    from core.models import *
    m=Menu(user_id=1, dish_id=1, order_start_time='2016-03-22 10:00', order_end_time='2016-03-22 15:00')
    m.validate_unique()
    # no output here - all is ok
    m.save()
    print(m.id)
    8

    # lets add duplicate
    m=Menu(user_id=1, dish_id=1, order_start_time='2016-03-22 12:00', order_end_time='2016-03-22 13:00')
    m.validate_unique()
    Traceback (most recent call last):
       File "<console>", line 1, in <module>
       File "/Users/el/tmp/hypothesis_test/menu/core/models.py", line 29, in validate_unique
           NON_FIELD_ERRORS: ['Overlapping order dates for dish'],
    django.core.exceptions.ValidationError: {'__all__': ['Overlapping order dates for dish']}

    # excellent!  dup is found!

    # But! Django helps you find dups but allows you to add them to db if you want it!
    # It's responsibility of your application not to add duplicates.

    m.save()
    print(m.id)
    9

How to be sure noboby can add duplicate?

In this case you need to make a CONSTRAINT on database-level.

In PostgreSQL console:

    CREATE EXTENSION btree_gist;

    -- our table:
    SELECT * FROM core_menu;
    id | user_id |    order_start_time    |     order_end_time     | dish_id
   ----+---------+------------------------+------------------------+---------
     8 |       1 | 2016-03-22 13:00:00+03 | 2016-03-22 18:00:00+03 |       1
     9 |       1 | 2016-03-22 15:00:00+03 | 2016-03-22 16:00:00+03 |       1

    DELETE FROM core_menu WHERE id=9; -- we should remove dups before adding unique constraint

    ALTER TABLE core_menu 
        ADD CONSTRAINT core_menu_exclude_dish_same_tstzrange_constr 
            EXCLUDE USING gist (dish_id WITH =, tstzrange(order_start_time, order_end_time)  WITH &&);

Now lets create the duplicate object and add it to db:

    m=Menu(user_id=1, dish_id=1, order_start_time='2016-03-22 13:00', order_end_time='2016-03-22 14:00')

    m.save()
    Traceback (most recent call last):
       File "/Users/el/tmp/hypothesis_test/venv/lib/python3.5/site-packages/django/db/backends/utils.py", line 64, in execute
         return self.cursor.execute(sql, params)
    psycopg2.IntegrityError: ОШИБКА:  конфликтующее значение ключа нарушает ограничение-исключение "core_menu_exclude_dish_same_tstzrange_constr"
    DETAIL:  Key (dish_id, tstzrange(order_start_time, order_end_time))=(1, ["2016-03-22 13:00:00+00","2016-03-22 14:00:00+00")) conflicts with existing key (dish_id, tstzrange(order_start_time, order_end_time))=(1, ["2016-03-22 10:00:00+00","2016-03-22 15:00:00+00")).

Excellent! Now data is validated at program and db levels.

like image 190
Eugene Lisitsky Avatar answered Oct 10 '22 22:10

Eugene Lisitsky