Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Django: how to do get_or_create() in a threadsafe way?

Tags:

In my Django app very often I need to do something similar to get_or_create(). E.g.,

User submits a tag. Need to see if that tag already is in the database. If not, create a new record for it. If it is, just update the existing record.

But looking into the doc for get_or_create() it looks like it's not threadsafe. Thread A checks and finds Record X does not exist. Then Thread B checks and finds that Record X does not exist. Now both Thread A and Thread B will create a new Record X.

This must be a very common situation. How do I handle it in a threadsafe way?

like image 628
Continuation Avatar asked Jul 05 '11 17:07

Continuation


People also ask

Is Django Get_or_create Atomic?

This method is atomic assuming correct usage, correct database configuration, and correct behavior of the underlying database.

Are Django models Threadsafe?

Note that the Django ORM is explicitly thread-safe. There are multiple references in the documentation about threaded operation.

What is Get_or_create?

get_or_create , is an awesome helper utility to have at your disposal when you need an object matching some specifications, but there should only be exactly one match — you want to retrieve it if it already exists, and create it if it doesn't.


2 Answers

Since 2013 or so, get_or_create is atomic, so it handles concurrency nicely:

This method is atomic assuming correct usage, correct database configuration, and correct behavior of the underlying database. However, if uniqueness is not enforced at the database level for the kwargs used in a get_or_create call (see unique or unique_together), this method is prone to a race-condition which can result in multiple rows with the same parameters being inserted simultaneously.

If you are using MySQL, be sure to use the READ COMMITTED isolation level rather than REPEATABLE READ (the default), otherwise you may see cases where get_or_create will raise an IntegrityError but the object won’t appear in a subsequent get() call.

From: https://docs.djangoproject.com/en/dev/ref/models/querysets/#get-or-create

Here's an example of how you could do it:

Define a model with either unique=True:

class MyModel(models.Model):     slug = models.SlugField(max_length=255, unique=True)     name = models.CharField(max_length=255)  MyModel.objects.get_or_create(slug=<user_slug_here>, defaults={"name": <user_name_here>}) 

... or by using unique_togheter:

class MyModel(models.Model):     prefix = models.CharField(max_length=3)     slug = models.SlugField(max_length=255)     name = models.CharField(max_length=255)      class Meta:         unique_together = ("prefix", "slug")  MyModel.objects.get_or_create(prefix=<user_prefix_here>, slug=<user_slug_here>, defaults={"name": <user_name_here>}) 

Note how the non-unique fields are in the defaults dict, NOT among the unique fields in get_or_create. This will ensure your creates are atomic.

Here's how it's implemented in Django: https://github.com/django/django/blob/fd60e6c8878986a102f0125d9cdf61c717605cf1/django/db/models/query.py#L466 - Try creating an object, catch an eventual IntegrityError, and return the copy in that case. In other words: handle atomicity in the database.

like image 60
Emil Stenström Avatar answered Oct 25 '22 08:10

Emil Stenström


This must be a very common situation. How do I handle it in a threadsafe way?

Yes.

The "standard" solution in SQL is to simply attempt to create the record. If it works, that's good. Keep going.

If an attempt to create a record gets a "duplicate" exception from the RDBMS, then do a SELECT and keep going.

Django, however, has an ORM layer, with it's own cache. So the logic is inverted to make the common case work directly and quickly and the uncommon case (the duplicate) raise a rare exception.

like image 26
S.Lott Avatar answered Oct 25 '22 08:10

S.Lott