Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MultipleObjectsReturned with get_or_create

I'm writing a small django command to copy data from a json API endpoint into a Django database. At the point I actually create the objects, with obj, created = model.objects.get_or_create(**filters), I am getting a MultipleObjectsReturned error. This is surprising to me, because my understanding of get_or_create is that if I try to create an object that already exists, it will just 'get' it instead.

I'm not certain about the integrity of the database I'm cloning, but even if there are multiple identical objects in it, when I load them into my local Django database, shouldn't get_or_create make it so that I never get more than one copy?

Can anybody explain this? I'm happy to give more specifics, I just didn't want to bog the reader down.

like image 555
Brian Peterson Avatar asked Jul 31 '13 02:07

Brian Peterson


3 Answers

Example code

Imagine you have the following model:

class DictionaryEntry(models.Model):
    name = models.CharField(max_length=255, null=False, blank=False)
    definition = models.TextField(null=True, blank=False)

and the following code:

obj, created = DictionaryEntry.objects.get_or_create(
    name='apple', definition='some kind of fruit')

get_or_create

In case you have not seen the code for get_or_create:

 # simplified
 def get_or_create(cls, **kwargs):
     try:
         instance, created = cls.get(**kwargs), False
     except cls.DoesNotExist:
         instance, created = cls.create(**kwargs), True
     return instance, created

about webservers...

Now imagine that you have a webserver with 2 worker processes that both have their own concurrent access to the database.

 # simplified
 def get_or_create(cls, **kwargs):
     try:
         instance, created = cls.get(**kwargs), False # <===== nope not there...
     except cls.DoesNotExist:
         instance, created = cls.create(**kwargs), True
     return instance, created

If the timing goes right (or wrong depending on how you want to phrase this), both processes can do the lookup and not find the item. They may both create the item. Everything is fine...

MultipleObjectsReturned: get() returned more than one KeyValue -- it returned 2!

Everything is fine... until you call get_or_create a third time, "third time is a charm" they say.

 # simplified
 def get_or_create(cls, **kwargs):
     try:
         instance, created = cls.get(**kwargs), False # <==== kaboom, 2 objects.
     except cls.DoesNotExist:
         instance, created = cls.create(**kwargs), True
     return instance, created

unique_together

How could you solve this? Maybe enforce a constraint at the database level:

class DictionaryEntry(models.Model):
    name = models.CharField(max_length=255, null=False, blank=False)
    definition = models.TextField(null=True, blank=False)
    class Meta:
        unique_together = (('name', 'definition'),)

back to the function:

 # simplified
 def get_or_create(cls, **kwargs):
     try:
         instance, created = cls.get(**kwargs), False
     except cls.DoesNotExist:
         instance, created = cls.create(**kwargs), True # <==== this handles IntegrityError
     return instance, created

Say you have the same race as before, and they both did not find the item and proceed to the insert; doing so they will start a transaction and one of them is going to win the race while the other will see an IntegrityError.

mysql ?

The example uses a TextField, which for mysql translates to a LONGTEXT (in my case). Adding the unique_together constraint fails the syncdb.

django.db.utils.InternalError: (1170, u"BLOB/TEXT column 'definition' used in key specification without a key length")

So, no luck, you may have to deal with MultipleObjectsReturned manually.

  • https://code.djangoproject.com/ticket/2495
  • https://code.djangoproject.com/ticket/12579
  • http://django.readthedocs.org/en/latest/topics/db/transactions.html#using-a-high-isolation-level
  • https://docs.djangoproject.com/en/dev/topics/db/transactions/#django.db.transaction.atomic

possible solutions

  • It may be possible to replace the TextField with a CharField.
  • It may be possible to add a CharField which may be a strong hash of the TextField, that you can compute in pre_save and use in a unique_together.
like image 197
dnozay Avatar answered Oct 05 '22 16:10

dnozay


As the name implies, get_or_create model.objects.get()s or model.objects.create()s.

It's conceptually equivalent to:

try:
   model.objects.get(pk=1)
except model.DoesNotExist:
   model.objects.create(pk=1)

The source is where you find definitive answers to these types of questions. Hint: search def get_or_create. As you can see, this function only catches DoesNotExist in the try/except.

def get_or_create(self, **kwargs):
    """
    Looks up an object with the given kwargs, creating one if necessary.
    Returns a tuple of (object, created), where created is a boolean
    specifying whether an object was created.
    """
    assert kwargs, \
            'get_or_create() must be passed at least one keyword argument'
    defaults = kwargs.pop('defaults', {})
    lookup = kwargs.copy()
    for f in self.model._meta.fields:
        if f.attname in lookup:
            lookup[f.name] = lookup.pop(f.attname)
    try:
        self._for_write = True
        return self.get(**lookup), False
    except self.model.DoesNotExist:
like image 33
Yuji 'Tomita' Tomita Avatar answered Oct 05 '22 16:10

Yuji 'Tomita' Tomita


Another situation that could cause MultipleObjectsReturned error with get_or_create() API seems to be if there are multiple threads calling this API at the same time with the same set of query parameters.

Solely relying on try... catch... to create a unique row in Python wouldn't work. If you are trying to use this API, I think you should have a matching uniqueness constraint on the appropriate columns in the database.

See: https://code.djangoproject.com/ticket/12579

like image 32
AdvilUser Avatar answered Oct 05 '22 15:10

AdvilUser