I have a model like the following:
class Foo(models.Model):
fruit = models.CharField(max_length=10)
stuff = models.CharField(max_length=10)
color = models.CharField(max_length=10)
owner = models.CharField(max_length=20)
exists = models.BooleanField()
class Meta:
unique_together = (('fruit', 'stuff', 'color'), )
It is populated with some data:
fruit stuff color owner exists
Apple Table Blue abc True
Pear Book Red xyz False
Pear Phone Green xyz False
Apple Phone Blue abc True
Pear Table Green abc True
I need to merge/join this with a collection (not a queryset):
[('Apple', 'Table', 'Blue'), ('Pear', 'Phone', 'Green')]
So basically rows 0 and 2 should return when I search this model with this list of tuples.
Currently my workaround is to read Foo.objects.all()
into a DataFrame and do a merge with the list of tuples and get the ID's to pass to Foo.objects.filter()
. I also tried iterating over the list and calling Foo.object.get()
on each tuple but it is very slow. The list is quite big.
When I tried chaining Q's as suggested by the current answers, it threw an OperationalError (too many SQL variables).
My main goal is the following:
As it can be seen from the model these three fields together form my primary key. The table contains around 15k entries. When I get data from another source I need to check if the data is already in my table and create/update/delete accordingly (new data may contain up to 15k entries). Is there a clean and efficient way to check if these records are already in my table?
Note: The list of tuples does not have to be in that shape. I can modify it, turn it into another data structure or transpose it.
You have ('fruit', 'stuff', 'color')
field unique together
So if your search tuple is ('Apple', 'Table', 'Blue')
and we concatenate it then also it will be a unique string
f = [('Apple', 'Table', 'Blue'), ('Pear', 'Phone', 'Green')]
c = [''.join(w) for w in f]
# Output: ['AppleTableBlue', 'PearPhoneGreen']
So we can filter queryset on annotations and make use of Concat.
Foo.objects.annotate(u_key=Concat('fruit', 'stuff', 'color', output_field=CharField())).filter(u_key__in=c)
# Output: <QuerySet [<Foo: #0row >, <Foo: #2row>]>
This will work for tuple and list
If input is list of 2 tuple:
[('Apple', 'Table', 'Blue'), ('Pear', 'Phone', 'Green')]
after transpose input will be:
transpose_input = [('Apple', 'Pear'), ('Table', 'Phone'), ('Blue', 'Green')]
We can easily identify by counting each_tuple_size and input_list_size that the input is transposed. so we can use zip to transpose it again and the above solution will work as expected.
if each_tuple_size == 2 and input_list_size == 3:
transpose_again = list(zip(*transpose_input))
# use *transpose_again* variable further
If input is list of 3 tuple:
[('Apple', 'Table', 'Blue'), ('Pear', 'Phone', 'Green'), ('Pear', 'Book', 'Red')]
After transpose input will be:
transpose_input = [('Apple', 'Pear', 'Pear'), ('Table', 'Phone', 'Book'), ('Blue', 'Green', 'Red')]
So it is impossible to identify that the input is transposed for every
n*n
matrix and above solution will Fail
If you know these fields constitute your natural key and you have to do heavy querying on them, add this natural key as a proper field and take measures to maintain it:
class FooQuerySet(models.QuerySet):
def bulk_create(self, objs, batch_size=None):
objs = list(objs)
for obj in objs:
obj.natural_key = Foo.get_natural_key(obj.fruit, obj.stuff, obj.color)
return super(FooQuerySet, self).bulk_create(objs, batch_size=batch_size)
# you might override update(...) with proper F and Value expressions,
# but I assume the natural key does not change
class FooManager(models.Manager):
def get_queryset(self):
return FooQuerySet(self.model, using=self._db)
class Foo(models.Model):
NK_SEP = '|||' # sth unlikely to occur in the other fields
fruit = models.CharField(max_length=10)
stuff = models.CharField(max_length=10)
color = models.CharField(max_length=10)
natural_key = models.CharField(max_length=40, unique=True, db_index=True)
@staticmethod
def get_natural_key(*args):
return Foo.NK_SEP.join(args)
def save(self, *args, **kwargs):
self.natural_key = Foo.get_natural_key(self.fruit, self.stuff, self.color)
Super(Foo, self).save(*args, **kwargs)
objects = FooManager()
class Meta:
unique_together = (('fruit', 'stuff', 'color'), )
Now you can query:
from itertools import starmap
lst = [('Apple', 'Table', 'Blue'), ('Pear', 'Phone', 'Green')]
existing_foos = Foo.objects.filter(natural_key__in=list(starmap(Foo.get_natural_key, lst)))
And batch create:
Foo.objects.bulk_create(
[
Foo(fruit=x[0], stuff=x[1], color=x[2])
for x in lst
if x not in set(existing_foos.values_list('fruit', 'stuff', 'color'))
]
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With