Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Django QuerySet update_or_create creating duplicate entries

Recently I'm facing issues in update_or_create method. Let me give a full explanation first.

Model:

class TransactionPageVisits(models.Model):
    transactionid = models.ForeignKey(
        Transaction,
        on_delete=models.CASCADE,
        db_column='transactionid',
    )
    sessionid = models.CharField(max_length=40, db_index=True)
    ip_address = models.CharField(max_length=39, editable=False)
    user_agent = models.TextField(null=True, editable=False)
    page = models.CharField(max_length=100, null=True, db_index=True)
    method = models.CharField(max_length=20, null=True)
    url = models.TextField(null=False, editable=False)
    created_dtm = models.DateTimeField(auto_now_add=True)

    class Meta(object):
        ordering = ('created_dtm',)

Function:

def _tracking(self, request, response, **kwargs):
    txn_details = kwargs.get('txn_details')
    data = {
        'sessionid': request.session.session_key,
        'ip_address': get_ip_address(request),
        'user_agent': get_user_agent(request),
        'method': request.method,
        'url': request.build_absolute_uri(),
        'transactionid': txn_details.txn_object,
        'page': kwargs.get('page')
    }

    # Keep updating/creating tracking data to model
    obj, created = TransactionPageVisits.objects.update_or_create(**data)

Notes:

I know I'm not passing any defaults arguments to update_or_create(), as at the time the code was written it was not required (wanted to create a new row only when all the columns as per data is collectively unique). Also _tracking() is in middleware and will be called in each request and response.

Everything was going smoothly until today I got following exception:

File "trackit.py", line 65, in _tracking
    obj, created = TransactionPageVisits.objects.update_or_create(**data)
  File "/usr/local/lib/python2.7/dist-packages/Django-1.10.4-py2.7.egg/django/db/models/manager.py", line 85, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/Django-1.10.4-py2.7.egg/django/db/models/query.py", line 488, in update_or_create
    obj = self.get(**lookup)
  File "/usr/local/lib/python2.7/dist-packages/Django-1.10.4-py2.7.egg/django/db/models/query.py", line 389, in get
    (self.model._meta.object_name, num)
MultipleObjectsReturned: get() returned more than one TransactionPageVisits -- it returned 2!

I noticed that there were two entries created in the table with exactly same value (except created_dtm as it was having auto_add_now=True):

| id    | sessionid                        | ip_address     | user_agent                                                                     | page | method | url                                                                                                    | created_dtm                | transactionid |
| 32858 | nrq2vwxbtsjp8yoibotpsur0zit5jhoq | xx.xxx.xxx.xxx | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:58.0) Gecko/20100101 Firefox/58.0 |      | GET    | https://www.example.com/example_url/?jobid=5a9f2acb4cedfd00011c7d5d&transactionid=XXXXXXXXXXXX | 2018-03-06 23:57:00.061280 | XXXXXXXXXXXX  |
| 32859 | nrq2vwxbtsjp8yoibotpsur0zit5jhoq | xx.xxx.xxx.xxx | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:58.0) Gecko/20100101 Firefox/58.0 |      | GET    | https://www.example.com/example_url/?jobid=5a9f2acb4cedfd00011c7d5d&transactionid=XXXXXXXXXXXX | 2018-03-06 23:57:00.062121 | XXXXXXXXXXXX  |

Why at the first place a duplicate entry created in the table?

like image 756
Saurav Kumar Avatar asked Mar 07 '18 12:03

Saurav Kumar


People also ask

What does a QuerySet return?

A QuerySet is evaluated when you call len() on it. This, as you might expect, returns the length of the result list. Note: If you only need to determine the number of records in the set (and don't need the actual objects), it's much more efficient to handle a count at the database level using SQL's SELECT COUNT(*) .

How do I append in QuerySet?

The Solution You can also use the chain() method from the Itertools module, which allows you to combine two or more QuerySets from different models through concatenation. Alternatively, you can use union() to combine two or more QuerySets from different models, passing all=TRUE if you want to allow duplicates.

What is annotate in Django QuerySet?

Django annotations 2 are a way of enriching the objects returned in QuerySets. That is, when you run queries against your models you can ask for new fields, whose values will be dynamically computed, to be added when evaluating the query. These fields will be accessible as if they were normal attributes of a model.

What is Select_related in Django?

Django offers a QuerySet method called select_related() that allows you to retrieve related objects for one-to-many relationships. This translates to a single, more complex QuerySet, but you avoid additional queries when accessing the related objects. The select_related method is for ForeignKey and OneToOne fields.


1 Answers

update_or_create is prone to a race condition, as described in the documentation:

As described above in get_or_create(), this method is prone to a race-condition which can result in multiple rows being inserted simultaneously if uniqueness is not enforced at the database level.

You can use unique_together in the model, as suggested in another answer. I've never tested this, but apparently Django catches the IntegrityError caused by these race conditions.

like image 96
Paulo Almeida Avatar answered Sep 25 '22 14:09

Paulo Almeida