Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is get_or_create() thread safe

I have a Django model that can only be accessed using get_or_create(session=session), where session is a foreign key to another Django model.

Since I am only accessing through get_or_create(), I would imagine that I would only ever have one instance with a key to the session. However, I have found multiple instances with keys to the same session. What is happening? Is this a race condition, or does get_or_create() operate atomically?

like image 690
Mantas Vidutis Avatar asked Jun 20 '11 19:06

Mantas Vidutis


People also ask

Is Django Get_or_create Atomic?

Since 2013 or so, get_or_create is atomic, so it handles concurrency nicely: This method is atomic assuming correct usage, correct database configuration, and correct behavior of the underlying database.

Is Django thread safe?

Note that the Django ORM is explicitly thread-safe. There are multiple references in the documentation about threaded operation.


3 Answers

NO, get_or_create is not atomic.

It first asks the DB if a satisfying row exists; database returns, python checks results; if it doesn't exist, it creates it. In between the get and the create anything can happen - and a row corresponding to the get criteria be created by some other code.

For instance wrt to your specific issue if two pages are open by the user (or several ajax requests are performed) at the same time this might cause all get to fail, and for all of them to create a new row - with the same session.

It is thus important to only use get_or_create when the duplication issue will be caught by the database through some unique/unique_together, so that even though multiple threads can get to the point of save(), only one will succeed, and the others will raise an IntegrityError that you can catch and deal with.

If you use get_or_create with (a set of) fields that are not unique in the database you will create duplicates in your database, which is rarely what you want.

More in general: do not rely on your application to enforce uniqueness and avoid duplicates in your database! THat's the database job! (well unless you wrap your critical functions with some OS-valid locks, but I would still suggest to use the database).

With thes warnings, used correctly get_or_create is an easy to read, easy to write construct that perfectly complements the database integrity checks.

Refs and citations:

  • http://groups.google.com/group/django-developers/browse_thread/thread/905f79e350525c95/0af3a41de4f4ce06
  • http://groups.google.com/group/django-developers/browse_thread/thread/f0b3381b2620e7db/8eae2f6087e550bb
like image 87
Stefano Avatar answered Oct 04 '22 18:10

Stefano


Actualy it's not thread-safe, you can look at the code of the get_or_create method of the QuerySet object, basicaly what it does is the following :

try:
    return self.get(**lookup), False
except self.model.DoesNotExist:
    params = dict([(k, v) for k, v in kwargs.items() if '__' not in k])
    params.update(defaults)
    obj = self.model(**params)
    sid = transaction.savepoint(using=self.db)
    obj.save(force_insert=True, using=self.db)
    transaction.savepoint_commit(sid, using=self.db)
    return obj, True

So two threads might figure-out that the instance does not exists in the DB and start creating a new one, before saving them consecutively.

like image 43
recamshak Avatar answered Oct 04 '22 18:10

recamshak


Threading is one problem, but get_or_create is broken for any serious usage in default isolation level of MySQL:

  • How do I deal with this race condition in django?
  • Why doesn't this loop display an updated object count every five seconds?
  • https://code.djangoproject.com/ticket/13906
  • http://www.no-ack.org/2010/07/mysql-transactions-and-django.html
like image 32
Tomasz Zieliński Avatar answered Oct 04 '22 17:10

Tomasz Zieliński