We run into a known issue in django:
IntegrityError during Many To Many add()
There is a race condition if several processes/requests try to add the same row to a ManyToManyRelation.
How to work around this?
Envionment:
How to reproduce it:
my_user.groups.add(foo_group)
Above fails if two requests try to execute this code at once. Here is the database table and the failing constraint:
myapp_egs_d=> \d auth_user_groups
id | integer | not null default ...
user_id | integer | not null
group_id | integer | not null
Indexes:
"auth_user_groups_pkey" PRIMARY KEY, btree (id)
fails ==> "auth_user_groups_user_id_group_id_key" UNIQUE CONSTRAINT,
btree (user_id, group_id)
Since this only happens on production machines, and all production machines in my context run postgres, a postgres only solution would work.
Yes, let us use the famed Publication
and Article
models from Django docs. Then, let's create a few threads.
import threading
import random
def populate():
for i in range(100):
Article.objects.create(headline = 'headline{0}'.format(i))
Publication.objects.create(title = 'title{0}'.format(i))
print 'created objects'
class MyThread(threading.Thread):
def run(self):
for q in range(1,100):
for i in range(1,5):
pub = Publication.objects.all()[random.randint(1,2)]
for j in range(1,5):
article = Article.objects.all()[random.randint(1,15)]
pub.article_set.add(article)
print self.name
Article.objects.all().delete()
Publication.objects.all().delete()
populate()
thrd1 = MyThread()
thrd2 = MyThread()
thrd3 = MyThread()
thrd1.start()
thrd2.start()
thrd3.start()
You are sure to see unique key constraint violations of the type reported in the bug report. If you don't see them, try increasing the number of threads or iterations.
Yes. Use through
models and get_or_create
. Here is the models.py adapted from the example in the django docs.
class Publication(models.Model):
title = models.CharField(max_length=30)
def __str__(self): # __unicode__ on Python 2
return self.title
class Meta:
ordering = ('title',)
class Article(models.Model):
headline = models.CharField(max_length=100)
publications = models.ManyToManyField(Publication, through='ArticlePublication')
def __str__(self): # __unicode__ on Python 2
return self.headline
class Meta:
ordering = ('headline',)
class ArticlePublication(models.Model):
article = models.ForeignKey('Article', on_delete=models.CASCADE)
publication = models.ForeignKey('Publication', on_delete=models.CASCADE)
class Meta:
unique_together = ('article','publication')
Here is the new threading class which is a modification of the one above.
class MyThread2(threading.Thread):
def run(self):
for q in range(1,100):
for i in range(1,5):
pub = Publication.objects.all()[random.randint(1,2)]
for j in range(1,5):
article = Article.objects.all()[random.randint(1,15)]
ap , c = ArticlePublication.objects.get_or_create(article=article, publication=pub)
print 'Get or create', self.name
You will find that the exception no longer shows up. Feel free to increase the number of iterations. I only went up to a 1000 with get_or_create
it didn't throw the exception. However add()
usually threw an exception with in 20 iterations.
Because get_or_create is atomic.
This method is atomic assuming correct usage, correct database configuration, and correct behavior of the underlying database. However, if uniqueness is not enforced at the database level for the kwargs used in a get_or_create call (see unique or unique_together), this method is prone to a race-condition which can result in multiple rows with the same parameters being inserted simultaneously.
Update:
Thanks @louis for pointing out that the through model can in fact be eliminated. Thuse the get_or_create
in MyThread2
can be changed as.
ap , c = article.publications.through.objects.get_or_create(
article=article, publication=pub)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With