I have a Django model that can only be accessed using get_or_create(session=session)
, where session is a foreign key to another Django model.
Since I am only accessing through get_or_create()
, I would imagine that I would only ever have one instance with a key to the session. However, I have found multiple instances with keys to the same session. What is happening? Is this a race condition, or does get_or_create()
operate atomically?
Since 2013 or so, get_or_create is atomic, so it handles concurrency nicely: This method is atomic assuming correct usage, correct database configuration, and correct behavior of the underlying database.
Note that the Django ORM is explicitly thread-safe. There are multiple references in the documentation about threaded operation.
NO, get_or_create is not atomic.
It first asks the DB if a satisfying row exists; database returns, python checks results; if it doesn't exist, it creates it. In between the get
and the create
anything can happen - and a row corresponding to the get
criteria be created by some other code.
For instance wrt to your specific issue if two pages are open by the user (or several ajax requests are performed) at the same time this might cause all get
to fail, and for all of them to create
a new row - with the same session.
It is thus important to only use get_or_create
when the duplication issue will be caught by the database through some unique
/unique_together
, so that even though multiple threads can get to the point of save(), only one will succeed, and the others will raise an IntegrityError that you can catch and deal with.
If you use get_or_create
with (a set of) fields that are not unique in the database you will create duplicates in your database, which is rarely what you want.
More in general: do not rely on your application to enforce uniqueness and avoid duplicates in your database! THat's the database job! (well unless you wrap your critical functions with some OS-valid locks, but I would still suggest to use the database).
With thes warnings, used correctly get_or_create
is an easy to read, easy to write construct that perfectly complements the database integrity checks.
Refs and citations:
Actualy it's not thread-safe, you can look at the code of the get_or_create method of the QuerySet object, basicaly what it does is the following :
try:
return self.get(**lookup), False
except self.model.DoesNotExist:
params = dict([(k, v) for k, v in kwargs.items() if '__' not in k])
params.update(defaults)
obj = self.model(**params)
sid = transaction.savepoint(using=self.db)
obj.save(force_insert=True, using=self.db)
transaction.savepoint_commit(sid, using=self.db)
return obj, True
So two threads might figure-out that the instance does not exists in the DB and start creating a new one, before saving them consecutively.
Threading is one problem, but get_or_create
is broken for any serious usage in default isolation level of MySQL:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With